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Abstract — In  this  paper  we  present  adaptive  and  distributed 
algorithms  for  motion  coordination  of  a  group  of  m  vehicles. 
The  vehicles  must  service  demands  whose  time  of  arrival,  spatial 
location  and  service  requirement  are  stochastic;  the  objective  is 
to  minimize  the  average  time  demands  spend  in  the  system.  The 
general  problem  is  known  as  the  m-vehicle  Dynamic  Traveling 
Repairman  Problem  (m-DTRP).  The  best  previously  known  con¬ 
trol  algorithms  rely  on  centralized  task  assignment  and  are  not 
robust  against  changes  in  the  environment.  In  this  paper,  we  first 
devise  new  control  policies  for  the  1-DTRP  that:  (1)  are  provably 
optimal  both  in  light-load  conditions  (i.e.,  when  the  arrival  rate 
for  the  demands  is  small)  and  in  heavy-load  conditions  (i.e., 
when  the  arrival  rate  for  the  demands  is  large),  and  (il)  are 
adaptive,  in  particular,  they  are  robust  against  changes  in  load 
conditions.  Then,  we  show  that  specific  partitioning  policies, 
whereby  the  environment  is  partitioned  among  the  vehicles  and 
each  vehicle  follows  a  certain  set  of  rules  within  its  own  region, 
are  optimal  in  heavy-load  conditions.  Building  upon  the  previous 
results,  we  finally  design  control  policies  for  the  m-DTRP  that  (i) 
are  adaptive  and  distributed,  and  (li)  have  strong  performance 
guarantees  in  heavy-load  conditions  and  stabilize  the  system  in 
any  load  condition. 


1.  Introduction 

Advances  in  computer  technology  and  wireless  communi¬ 
cations  have  opened  up  tremendous  possibilities  for  deploying 
large  networks  of  autonomous  mobile  vehicles  capable  of 
interacting  directly  with  the  environment.  Potential  applica¬ 
tions  are  numerous  and  include  surveillance  and  exploration 
missions,  where  vehicles  are  required  to  visit  points  of  inter¬ 
est,  and  transportation  and  distribution  tasks,  where  vehicles 
are  required  to  provide  service  or  goods  to  people.  In  this 
context,  one  of  the  major  scientific  challenges  is  the  design 
of  possibly  distributed  multi-vehicle  policies  to  assign  and 
schedule  demands  for  service  (or  targets)  that  are  defined  over 
an  extended  geographical  area. 

In  the  recent  past,  considerable  efforts  have  been  devoted 
to  the  problem  of  how  to  cooperatively  assign  and  schedule 
demands  in  multi-vehicle  systems  [l]-[6].  In  these  papers, 
the  underlying  mathematical  model  is  static  and  fits  within 
the  framework  of  the  Vehicle  Routing  Problem  (see  [7]  for 
a  thorough  introduction  to  the  VRP),  whereby:  (i)  a  team 
of  m  vehicles  is  required  to  service  a  set  of  n  demands  in 
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a  2-dimensional  space;  (ii)  each  demand  requires  a  certain 
amount  of  on-site  service;  (iii)  the  goal  is  to  compute  a  set  of 
routes  that  optimizes  the  cost  of  servicing  (according  to  some 
quality  of  service  metric)  the  demands.  The  VRP  is  static  in 
the  sense  that  vehicle  routes  are  computed  assuming  that  no 
new  demands  arrive. 

Despite  its  richness,  inherent  elegance  and  frequent  occur¬ 
rence  in  practical  problems,  the  VRP  is  often  a  static  approx¬ 
imation  to  problems  that  are  in  reality  dynamic,  and  therefore 
it  fails  to  be  an  appropriate  model  in  many  applications  of 
interest  [9].  In  fact,  demands  often  arrive  sequentially  in  time, 
and  planning  algorithms  should  actually  provide  policies  (in 
contrast  to  pre-planned  routes)  that  prescribe  how  the  routes 
should  evolve  as  a  function  of  those  inputs  that  evolve  in  real¬ 
time.  From  a  methodological  point  of  view,  dynamic  demands 
(i.e.,  demands  that  vary  over  time)  add  queueing  phenomena  to 
the  combinatorial  nature  of  the  VRP.  As  a  motivating  example, 
consider  a  surveillance  mission  where  a  team  of  Unmanned 
Aerial  Vehicles  (UAVs)  must  ensure  continued  coverage  of  a 
certain  area;  as  events  occur,  i.e.,  as  new  targets  are  detected 
by  on-board  sensors  or  other  assets  (e.g.,  intelligence,  high- 
altitude  or  orbiting  platforms,  etc.),  UAVs  must  proceed  to  the 
location  of  the  new  event  and  provide  close-range  information 
about  the  target.  Each  request  for  close-range  observation 
might  also  require  an  on-site  service  time  (e.g.,  a  time  interval 
to  identify  the  target),  which  might  be  known  a  priori  only 
through  prior  statistics.  The  UAV  mission  control  is  then  a 
continuous  process  of  detecting  targets,  planning  routes  and 
sending  UAVs.  In  such  a  dynamic  and  stochastic  setting,  the 
routing  process  might  rely  on  a  priori  statistics  about  the  future 
stream  of  targets,  and  stability  of  the  overall  system  is  an 
additional  issue  to  be  considered. 

Accordingly,  in  the  present  work  we  wish  to  address  the 
assignment  and  scheduling  problem  in  the  more  general  frame¬ 
work  where  new  demands  for  service  are  generated  over  time 
by  a  stochastic  process.  In  [10],  the  authors  present  a  setting 
where  demands  arrive  sequentially  over  time  and  routes  are 
generated  in  real  time,  however  the  optimization  problem  is 
over  a  finite  horizon  and  the  analysis  only  addresses  a  number 
of  vehicles  m  <  2.  Arguably,  the  most  general  model  for 
vehicle  routing  problems  that  have  both  a  dynamic  and  a 
stochastic  component  is  the  m-vehicle  Dynamic  Traveling 
Repairman  Problem  (m-DTRP)  [8],  [9],  [11],  [12].  In  the  m- 
DTRP,  m  vehicles  operating  in  a  bounded  environment  and 
traveling  with  bounded  velocity  must  service  demands  whose 
time  of  arrival,  location  and  on-site  service  are  stochastic  (in 
the  UAV  example,  the  inter-arrival  times  might  be  exponential 
random  variables,  the  demands’  locations  might  be  uniformly 
distributed  within  the  workspace,  and  the  time  intervals  re¬ 
quired  to  identify  the  targets  might  be  exponential  random 


Report  Documentation  Page 


Form  Approved 
0MB  No.  0704-0188 


Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  0MB  control  number. 


1.  REPORT  DATE 

18  NOV  2010 


2.  REPORT  TYPE 


4.  TITLE  AND  SUBTITLE 

Adaptive  And  Distributed  Algorithms  For  Vehicle  Routing  In  A 
Stochastic  And  Dynamic  Environment 

6.  AUTHOR(S) 


7.  PEREORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

California  Institute  of  Technology, Jet  Propulsion 
Laboratory, Pasadena, CA, 91 109 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 


3.  DATES  COVERED 

00-00-2010  to  00-00-2010 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


10.  SPONSOR/MONITOR’S  ACRONYM(S) 

11.  SPONSOR/MONITOR’S  REPORT 
NUMBER(S) 


12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

IEEE  Transactions  on  Automatic  Control,Volume:  PP  Issue:99,On  page(s):  1  - 1  ISSN:  0018-9286, 
Government  or  Eederal  Purpose  Rights  License. 

14.  ABSTRACT 

In  this  paper  we  present  adaptive  and  distributed  algorithms  for  motion  coordination  of  a  group  of  m 
vehicles.The  vehicles  must  service  demands  whose  time  of  arrival,  spatial  location  and  service  requirement 
are  stochastic;  the  objective  is  to  minimize  the  average  time  demands  spend  in  the  system.  The  general 
problem  is  known  as  the  m-vehicle  Dynamic  Traveling  Repairman  Problem  (m-DTRP).  The  best 
previously  known  control  algorithms  rely  on  centralized  task  assignment  and  are  not  robust  against 
changes  in  the  environment.  In  this  paper,  we  first  devise  new  control  policies  for  the  1-DTRP  that:  (i)  are 
provably  optimal  both  in  light-load  conditions  (i.e.,  when  the  arrival  rate  for  the  demands  is  small)  and  in 
heavy-load  conditions  (i.e.,when  the  arrival  rate  for  the  demands  is  large),  and  (ii)  are  adaptive,  in 
particular,  they  are  robust  against  changes  in  load  conditions.  Then,  we  show  that  specific  partitioning 
policies  whereby  the  environment  is  partitioned  among  the  vehicles  and  each  vehicle  follows  a  certain  set  of 
rules  within  its  own  region,are  optimal  in  heavy-load  conditions.  Building  upon  the  previous  results,  we 
finally  design  control  policies  for  the  m-DTRP  that  (i)are  adaptive  and  distributed,  and  (ii)  have  strong 
performance  guarantees  in  heavy-load  conditions  and  stabilize  the  system  in  any  load  condition. 

15.  SUBJECT  TERMS 


16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 

18.  NUMBER 

19a.  NAME  OE 

ABSTRACT 

OF  PAGES 

RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Same  as 
Report  (SAR) 

16 

2 


TABLE  I 

Adaptive  policies  eor  the  1-DTRP 


Properties 

DC  Policy,  r  ^  oo 

DC  Policy,  r  =  1 

RH  Policy 

UTSP  Policy  [8],  r  ^  oo 

Light-load  performance 

optimal 

optimal 

optimal 

not  optimal 

Heavy-load  performance 

optimal 

within  100% 
of  the  optimum 

within  100% 
of  the  optimum* 

optimal 

Adaptive 

yes,  except  for  / 

yes 

yes 

no 

TABLE  II 

Distributed  and  adaptive  policies  for  the  m-DTRP 


Properties 

m-DC  Policy, 

r  oo 

m-RH  Policy 

UTSP  Policy  [8], 

r  oo 

Light-load  performance 

“locally”  optimal 

“locally”  optimal 

not  optimal 

Heavy-load  performance: 
a)  uniform  / 

optimal 

within  100%  of  optimal  unbiased*’^ 

optimal 

b)  general  /  and  s  =  0 

optimal 

within  100%  of  optimal  unbiased*’^ 

optimal 

c)  general  /  and  5  >  0 

within  m  of  optimal  unbiased^ 

within  2m  of  optimal  unbiased*’^ 

optimal 

Adaptive 

yes,  except  for  / 

yes,  except  for  / 

no 

Distributed 

yes 

yes 

no 

variables  as  well).  The  objective  is  to  find  a  policy  to  service 
demands  over  an  infinite  horizon  that  minimizes  the  expected 
system  time  (wait  plus  service)  of  the  demands.  The  best 
previously  known  control  policies  for  the  m-DTRP  rely  on 
centralized  task  assignment  and  are  not  robust  against  changes 
in  the  environment,  in  particular  changes  in  load  conditions; 
therefore,  they  are  of  limited  applicability  in  scenarios  involv¬ 
ing  ad-hoc  networks  of  autonomous  vehicles  operating  in  a 
time-varying  environment. 

In  this  paper  we  devise  adaptive  and  distributed  policies 
for  the  m-DTRP.  We  mainly  focus  on  policies  that  in  heavy 
load  are  constrained  to  provide  spatially-unbiased  service, 
i.e.,  the  same  quality  of  service  to  all  outstanding  demands 
independently  of  their  locations  (we  refer  to  such  policies  as 
unbiased  policies).  We  argue  in  Section  III  how  such  constraint 
is  indeed  natural  for  applications.  The  contribution  is  threefold: 
First,  we  present  a  new  class  of  policies  for  the  1-DTRP 
that  are  adaptive  (e.g.,  to  the  load  conditions)  and  satisfy 
the  aforementioned  constraint.  In  particular,  we  propose  the 
Divide  &  Conquer  (DC)  policy,  whose  performance  depends 
on  a  design  parameter  r  G  N>o.  If  r  — )■  -boo,  the  policy  is  (i) 
provably  optimal  both  in  light-  and  in  heavy-load  conditions, 
and  (ii)  adaptive  with  respect  to  changes  in  the  load  conditions 
and  in  the  statistics  of  the  on-site  service  requirement;  if, 
instead,  r  =  1,  the  policy  is  (i)  provably  optimal  in  light-load 
conditions  and  within  a  factor  2  of  the  optimal  performance 
in  heavy-load  conditions,  and  (ii)  adaptive  with  respect  to 
all  problem  data,  in  particular,  and  perhaps  surprisingly,  it 
does  not  require  any  knowledge  about  the  demand  generation 
process.  Moreover,  by  applying  ideas  of  receding-horizon 
control  to  dynamic  vehicle  routing  problems,  we  introduce 
the  Receding  Horizon  (RH)  policy,  that  also  does  not  require 
any  knowledge  about  the  demand  generation  process;  we 
show  that  the  RH  policy  is  optimal  in  light  load  and  stable 
in  any  load  condition,  and  we  heuristically  argue  that  its 
performance  is  close  to  optimality  in  heavy-load  conditions 
(in  particular,  we  heuristically  argue  that  the  RH  policy  is  the 
best  available  unbiased  and  adaptive  policy  for  the  1-DTRP). 
Second,  we  show  that  specific  partitioning  policies,  whereby 


the  environment  is  partitioned  among  the  vehicles  and  each 
vehicle  follows  a  certain  set  of  rules  within  its  own  region, 
are  optimal  in  heavy-load  conditions.  Finally,  by  combining 
the  DC  policy  with  suitable  partitioning  policies,  we  design  a 
routing  policy  for  the  m-DTRP  (called  m-DC  policy)  that  (i)  is 
spatially  distributed,  scalable  to  large  networks,  and  adaptive  to 
network  changes,  (ii)  is  within  a  constant  factor  of  the  optimal 
unbiased  performance  in  heavy-load  conditions  (in  particular, 
it  is  an  optimal  unbiased  policy  when  demands  are  uniformly 
dispersed  over  the  environment  or  when  the  average  on-site 
service  time  s  is  zero)  and  stabilizes  the  system  in  any  load 
condition.  Here,  by  network  changes  we  mean  changes  in  the 
number  of  vehicles,  in  the  arrival  rate  of  demands,  and  in 
the  characterization  of  the  on-site  service  requirement.  We 
mention  that  the  problems  addressed  in  this  paper,  and  the 
proposed  algorithms,  can  not  be  cast  within  the  framework  of 
“traditional”  queueing  systems,  and  thus  their  analysis  requires 
the  use  of  non-standard  techniques.  Tables  I  and  II  provide  a 
synoptic  view  about  the  main  properties  of  the  routing  policies 
that  we  propose  and  analyze  in  this  paper;  our  policies  are 
compared  with  the  best  spatially-unbiased  policy  available  in 
the  literature,  i.e.,  the  UTSP  policy  with  r  —>  oo  [8].  (In  Tables 
I  and  II  an  asterisk  *  signals  that  the  result  is  heuristic,  a 
dagger  f  means  that  the  spatially-unbiased  constraint  might  be 
violated  in  heavy  load,  /  denotes  the  spatial  probability  density 
for  the  demands’  locations,  m  is  the  number  of  vehicles,  and 
“optimal  unbiased”  stands  for  optimal  performance  for  policies 
providing  spatially-unbiased  service.) 

The  paper  is  structured  as  follows.  In  Section  II  we  pro¬ 
vide  some  background  on  the  Euclidean  Traveling  Salesman 
Problem,  and  on  the  Continuous  Multi-Median  Problem.  In 
Section  III  we  formulate  the  problem  we  wish  to  address,  and 
we  review  existing  results.  In  Sections  IV  and  V  we  present 
and  analyze,  respectively,  the  Divide  (&  Conquer  policy  and  the 
Receding  Horizon  policy.  In  Section  VI  we  discuss  partitioning 
policies  and  we  design  distributed  algorithms  for  the  m-DTRP. 
In  Section  VII  we  discuss  the  particular  case  when  the  on-site 
service  time  is  zero,  and  in  Section  VIII  we  present  results 
from  numerical  experiments.  Finally,  in  Section  IX  we  draw 
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some  conclusions  and  discuss  some  directions  for  future  work. 
11.  Preliminaries 

In  this  section,  we  briefly  describe  some  known  concepts 
from  combinatorics  and  locational  optimization,  on  which  we 
will  rely  extensively  later  in  the  paper. 


1  inkern';  this  powerful  solver  yields  approximations  in  the 
order  of  5%  of  the  optimal  tour  cost  very  quickly  for  many 
instances.  If  a  constant-factor  approximation  algorithm  is  used, 
the  effect  on  the  asymptotic  performance  guarantees  of  our 
algorithms  can  be  simply  modeled  as  a  scaling  of  the  constant 
/^TSP.d- 


A.  Equitable  Partitions 

Let  Q  C  be  a  compact,  convex  set.  An  r-partition 
of  Q  (where  r  €  N,  r  >  1)  is  a  collection  of  r  closed 
subsets  {Qk}k=i  with  disjoint  interiors  whose  union  is  Q. 
Given  a  measurable  function  f  :  Q  ^  IR>o>  r-partition 
{2fc}fc=i  is  equitable  with  respect  to  /  if  ^Q^f{x)dx  = 
Jq  f{x)  dx/r  for  all  k  G  r}.  Similarly,  given  two 

measurable  functions  fj'.Q—>-  M>o,  j  €  {Ij  2},  an  r-partition 
{2fc}fc=i  is  simultaneously  equitable  with  respect  to  fi  and 
/2  if  jQ^fj{x)dx  =  jQfj(x)dx/r  for  all  k  G  r} 

and  j  G  {1,2}.  Theorem  12  in  [13]  shows  that,  given  two 
measurable  functions  IR>o,  J  €  (1,  2},  there  always 

exists  an  r-partition  of  Q  that  is  simultaneously  equitable  with 
respect  to  fi  and  /2  and  where  the  subsets  Qk  are  convex. 


B.  Asymptotic  and  Worst-Case  Properties  of  the  Euclidean 
Traveling  Salesman  Problem 

The  Euclidean  Traveling  Salesman  Problem  (for  short,  TSP) 
is  formulated  as  follows:  given  a  set  Dofn  points  in  M'',  find  a 
minimum-length  tour  (i.e.,  a  cycle  that  visits  all  nodes  exactly 
once)  of  U;  the  length  of  a  tour  is  the  sum  of  all  Euclidean 
distances  on  the  tour.  The  decision  version  of  the  TSP  belongs 
to  the  class  of  NP-complete  problems,  which  suggests  that 
there  is  no  general  algorithm  capable  of  finding  the  optimal 
tour  in  an  amount  of  time  polynomial  in  the  size  of  the  input. 
Let  TSP(Z?)  denote  the  minimum  length  of  a  tour  through  all 
the  points  in  by  convention,  TSP(0)  =  0.  Assume  that  the 
locations  of  the  n  points  are  random  variables  independently 
and  identically  distributed  in  a  compact  set  Q  C  in  [14] 
it  is  shown  that  there  exists  a  constant  /^TSP.d  such  that 


E[TSP(i:))] 

lim  - ,  ,  . , — 

n— >-+oo  77,-*-  -*-/“ 


(1) 


where  /  is  the  density  of  the  absolutely  continuous  part  of  the 
point  distribution.  Eor  the  case  d  =  2,  the  constant  /3tsp,2  has 
been  estimated  numerically  as  /3tsp,2  —  0.7120±0.0002  [15]. 

Eor  any  compact  environment  Q  C  the  following 
(deterministic)  bound  holds  on  the  length  of  the  optimal  TSP 
tour,  uniformly  on  n  [14]: 


TSP(i?)</?e.d|Qr/‘^n'-'/^  (2) 


where  |Q|  is  the  hypervolume  of  Q  and  is  a  constant 
generally  larger  than  /^TSP.d- 

In  the  following,  we  will  present  algorithms  that  re¬ 
quire  on-line  solutions  of  possibly  large  TSPs.  Practical  im¬ 
plementations  of  the  algorithms  will  rely  on  heuristics  or 
on  polynomial-time  approximation  schemes,  such  as  Lin- 
Kernighan’s  [16]  or  Christofides’  [17].  A  modified  ver¬ 
sion  of  the  Lin-Kernighan  heuristic  [16]  is  implemented  in 


C.  The  Continuous  Multi-Median  Problem 

Given  a  compact  set  Q  C  and  a  vector  P  = 
(pi, . . .  ,Pm)  of  tn  distinct  points  in  Q,  the  expected  distance 
between  a  random  point  x,  generated  according  to  a  proba¬ 
bility  density  function  /,  and  the  closest  point  in  P  is  given 
by 


H^iP,Q) 


E 


min  \\pk  -  x\\ 


m  « 

/  \\pk-x\\f{x)dx, 

k—1 


where  V(P,  Q)  =  (Vi(P,  Q),  •  •  • ,  Vm(P,  Q))  is  the  Voronoi 
partition  of  the  set  Q  generated  by  the  points  P.  In  other 
words,  X  G  Vfc(P,  Q)  if  and  only  if  ||a;— pfe||  <  ||a;— pjH,  for  all 
j  G  m}.  The  set  Vk  is  referred  to  as  the  Voronoi  cell  of 

the  generator  p^.  The  function  Hm  is  known  in  the  locational 
optimization  literature  as  the  continuous  Weber  function  or 
the  continuous  multi-median  function;  see  [18]  and  references 
therein. 

The  m-median  of  the  set  Q,  with  respect  to  the  mea¬ 
sure  induced  by  /,  is  the  global  minimizer  Pf^{Q)  = 
argminpge™  Q).  We  let  H*^{Q)  =  H^iPf^{Q),  Q) 

be  the  global  minimum  of  Hm-  It  is  straightforward  to  show 
that  the  map  P  PI i  {P,  Q)  is  differentiable  and  strictly 
convex  on  Q.  Therefore,  it  is  a  simple  computational  task 
to  compute  P}(Q).  We  refer  to  Pi{Q)  as  the  median  of  Q. 
On  the  other  hand,  when  m  >  1,  the  map  P  i— >  P[m{P,  Q) 
is  differentiable  (whenever  (pi, . . .  ,Pm)  are  distinct)  but  not 
convex,  thus  making  the  solution  of  the  continuous  m-median 
problem  hard  in  the  general  case.  The  set  of  critical  points  of 
Pirn  contains  all  configurations  (pi, . . .  ,Pm)  with  the  property 
that  each  point  pk  is  the  generator  of  the  Voronoi  cell  Vk  {P,  Q) 
as  well  as  the  median  of  Vk{P,  Q)  (we  refer  to  such  Voronoi 
diagrams  as  median  Voronoi  diagrams);  in  particular,  the 
global  minimum  of  must  be  one  of  these  configurations. 


III.  Problem  Eormulation  and  Existing  Results 
In  this  section  we  formulate  the  problem  we  wish  to  address 
and  we  review  existing  results. 


A.  Problem  Formulation 

The  problem  we  consider  in  this  paper  is  known  as  the  m- 
vehicle  Dynamic  Traveling  Repairman  Problem  (m-DTRP), 
and  was  introduced  by  Psaraftis  in  [11]  and  later  studied  by 
Bertsimas  and  van  Ryzin  in  [8],  [9],  [12]. 

Let  the  workspace  Q  C  be  a  compact,  convex  set,  and 
let  II  •  II  denote  the  Euclidean  norm  in  We  define  the 

*1  inkern  is  freely  available  for  academic  research  use  at 
WWW . tsp . gatech . edu/ concorde/index . html. 
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diameter  of  Q  as:  diam(Q)  =  sup{||p  —  q\\\p,q  G  Q}.  For 
simplicity,  we  will  only  consider  the  planar  case,  i.e.,  d  =  2, 
with  the  understanding  that  extensions  to  higher  dimensions 
are  possible.  Consider  m  holonomic  vehicles,  modeled  as 
point  masses,  and  let  P{t)  =  . . .  ,pm{t))  G  Q™ 

describe  the  locations  of  the  vehicles  at  time  t.  The  vehicles 
are  free  to  move,  traveling  at  a  constant  velocity  v,  within  the 
environment  Q.  The  vehicles  are  identical,  and  have  unlimited 
range  and  demand  servicing  capacity. 

Demands  are  generated  according  to  a  homogeneous  (i.e., 
time-invariant)  spatio-temporal  Poisson  process^,  with  time 
intensity  A  G  K>o,  and  spatial  density  /  :  Q  — >  ffi>o-  In  other 
words,  demands  arrive  to  Q  according  to  a  Poisson  process 
with  intensity  A  and  their  locations  {Xj;  j  ^  1}  are  i.i.d. 
(i.e.,  independent  and  identically  distributed)  and  distributed 
according  to  a  density  /  whose  support  is  Q.  A  demand’s 
location  becomes  known  (is  realized)  at  its  arrival  epoch;  thus, 
at  time  t  vehicles  know  with  certainty  the  locations  of  demands 
that  arrived  prior  to  time  t,  but  future  demand  locations  form 
an  i.i.d.  sequence.  The  density  /  satisfies: 

P[ArjG5]=  f  f{x)dx  V5  C  Q,  and  f  f{x)dx  =  l. 

Js  Jq 

We  assume  that  the  number  of  initial  outstanding  demands  is 
a  random  variable  with  finite  first  and  second  moments. 

At  each  demand  location  Xj  vehicles  spend  some  time  Sj 
in  on-site  service  that  is  i.i.d.  and  generally  distributed  with 
finite  first  and  second  moments  denoted  by  s  and  (vve  also 
assume  s  >  0;  we  will  discuss  in  Section  VII  the  case  s  =  0). 
A  realized  demand  is  removed  from  the  system  after  one  of 
the  vehicles  has  completed  its  on-site  service.  We  define  the 
load  factor  g  =  Xs /m. 

The  system  time  of  demand  j  (demands  are  labeled  in 
an  increasing  order  with  respect  to  time  of  arrival),  denoted 
by  Tj,  is  defined  as  the  elapsed  time  between  the  arrival  of 
demand  j  and  the  time  one  of  the  vehicles  completes  its 
service.  The  waiting  time  of  demand  j,  Wj,  is  defined  by 
Wj  =  Tj  —  Sj.  The  steady-state  system  time  is  defined  by 
T  =  limsupj_).go  E  [Tj].  A  policy  for  routing  the  vehicles 
is  said  to  be  stable  if  the  expected  number  of  demands  in 
the  system  is  uniformly  bounded  at  all  times.  A  necessary 
condition  for  the  existence  of  a  stable  policy  is  that  Q  <  f 
we  shall  assume  g  <  1  throughout  the  paper.  When  we  refer 
to  light-load  conditions,  we  consider  the  case  p  — >  0+,  in  the 
sense  that  A  — )■  O'*";  when  we  refer  to  heavy-load  conditions, 
we  consider  the  case  g  — >  1“,  in  the  sense  that  A  — >■  \m/s\~. 

Let  V  be  the  set  of  all  causal,  stable  and  stationary  routing 
policies.  Letting  denote  the  system  time  of  a  particular 
policy  TT  G  T,  the  m-DTRP  is  then  defined  as  the  problem  of 
finding  a  policy  tt*  (if  one  exists)  such  that 

=  inf  T^. 

TZ^V 

^There  are  three  main  reasons  behind  the  choice  of  modeling  the  demand 
arrival  process  with  a  Poisson  process:  First,  the  system  we  deal  with  is  a 
spatial  queueing  system,  and  the  arrival  process  to  a  queueing  system  is  often 
well  approximated  by  a  Poisson  process  (see,  e.g.,  [19]).  Second,  the  Poisson 
process  leads  to  a  model  that  is  analytically  tractable.  Third,  the  Poisson 
process,  being  “the  most  random”  way  to  distribute  events  in  time,  leads 
to  useful  ’’worst-case”  scenario  evaluations  (worst-case  with  respect  to  the 
possible  types  of  arrival  processes). 


We  let  T  denote  the  infimum  in  the  right  hand  side  above. 

We  finally  need  two  definitions;  in  these  definitions,  X  is  the 
location  of  a  randomly  chosen  demand  and  W  is  its  waiting 
time. 

Definition  3.1:  A  policy  tt  is  called  spatially  unbiased  if 
for  every  pair  of  sets  5i,  ^2  C  Q  it  holds  E  \W  \X  G  5i]  = 
E  \W  \X  G  52];  a  policy  tt  is  instead  called  spatially  biased 
if  there  exist  sets  5i,  S2  ^  Q  such  that  E  [W  \X  G  5i]  > 
E[IL|X  G  52]. 


B.  Existing  Results  for  the  m-DTRP 

As  in  many  queueing  problems,  the  analysis  of  the  m- 
DTRP  for  all  values  of  p  G  (0,  1)  is  difficult.  In  [8],  [9], 
[12],  [20],  lower  bounds  for  the  optimal  steady-state  system 
time  are  derived  for  the  light-load  case  (i.e.,  g  — >  0+),  and  for 
the  heavy-load  case  (i.e.,  g  — >  1“).  Subsequently,  policies  are 
designed  for  these  two  limiting  regimes,  and  their  performance 
is  compared  to  the  lower  bounds. 

1)  Lower  Bounds:  For  the  light-load  case,  a  tight  lower 
bound  on  the  system  time  is  derived  in  [12].  In  the  light-load 
case,  the  lower  bound  on  the  system  time  is  strongly  related 
to  the  solution  of  the  multi-median  problem: 

T*  >^HI{Q)  +  s,  asp^0+.  (3) 

This  bound  is  tight:  there  exist  policies  whose  system  times, 
in  the  limit  p  — >  O’*",  attain  this  bound;  we  present  such 
asymptotically  optimal  policies  in  Section  III-B2. 

The  following  two  lower  bounds  hold  in  the  heavy-load  case 

[8],  [20].  Within  the  class  in  V  of  spatially-unbiased  policies 

- =1= 

the  optimal  system  time  Ty  is  lower  bounded  by 


-*  /3T-  -  ^ 


-'TSP,2 


/o  f^^'^ix)dx 


-,2  9,2 


as  p  — >  1  .  (4) 


Within  the  class  in  V  of  spatially-biased  policies  the  optimal 
system  time  Tg  is  lower  bounded  by 


♦  /3tSP.2 


T^> 


fo  P^^{x)dx 


1  3 


')2  ^i2 


as  p  — )■  1  .  (5) 


Both  bounds  (4)  and  (5)  are  tight:  there  exist  policies  whose 
system  times  attain  these  bounds,  in  the  limit  p  — >  1”. 
Therefore  the  inequalities  in  (4)  and  (5)  should  indeed  be 
replaced  with  equalities.  We  present  asymptotically  optimal 
policies  for  the  heavy-load  case  in  Section  III-B3.  It  is  shown 
in  [8]  that  the  lower  bound  in  equation  (5)  is  always  lower  than 
or  equal  to  the  lower  bound  in  equation  (4)  for  all  densities 
/■ 

Remark  3.2:  In  equations  (4)  and  (5),  the  right-hand  side 
approaches  -foo  as  p  — >  1~ .  Thus,  one  should  more  formally 
write  the  inequalities  with  T  (1  —  p)^  on  the  left-hand  side, 
so  that  the  right-hand  side  is  finite.  However,  this  makes  the 
presentation  less  readable,  and  thus,  in  the  remainder  of  the 
paper,  we  adhere  to  the  less  formal  but  more  transparent  style 
of  equations  (4)  and  (5).  Furthermore,  this  is  the  style  generally 
used  in  the  literature  on  the  m-DTRP  (see,  e.g.,  [8]  Section  4 
and  [21]  Proposition  2). 
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2)  Optimal  Policies  for  the  Light-Load  Case:  For  the  light¬ 
load  case,  to  the  best  of  our  knowledge,  there  is  only  one 
policy  that  is  known  to  be  optimal;  this  policy  is  as  follows. 

The  m  Stochastic  Queue  Median  (SQM)  Policy 

[12]  —  Locate  one  vehicle  at  each  of  the  m-median 
locations  for  the  environment  Q.  When  demands 
arrive,  assign  them  to  the  nearest  median  location 
and  its  corresponding  vehicle.  Have  each  vehicle 
service  its  respective  demands  in  First-Come,  First- 
Served  (FCFS)  order  returning  to  its  median  after 
each  service  is  completed. 

This  policy,  although  optimal  in  light  load,  has  two  character¬ 
istics  that  limit  its  application  to  large-scale  vehicle  networks: 
First,  it  quickly  becomes  unstable  as  the  load  increases,  i.e., 
there  exists  Pc  <  1  such  that  for  all  g  >  Qc  the  system  time 
Tsqm  is  infinite  (hence,  this  policy  is  not  adaptive  with  respect 
to  load  conditions).  Second,  a  central  entity  needs  to  compute 
the  TO-median  locations  and  assign  them  to  the  vehicles  (hence, 
it  is  centralized). 

3}  Optimal  Policies  for  the  Heavy-Load  Case:  For  the 
heavy-load  case,  to  the  best  of  our  knowledge,  there  is  only 
one  spatially-unbiased  policy  that  has  been  proven  to  achieve 
the  lower  bound  in  equation  (4);  this  policy  is  as  follows. 


The  Unbiased  TSP  (UTSP)  Policy  [8]  —  Let  r  be 

a  fixed  positive,  large  integer.  From  a  central  point 
in  the  interior  of  Q,  subdivide  the  service  region  into 
r  wedges  Qi,  Q2,  •  •  • ,  Qr  such  that  f{x)dx  = 
1/r,  k  €.  {l,...,r}.  Within  each  subregion,  form 
sets  of  size  n/r  (n  is  a  design  parameter).  As  sets  are 
formed,  deposit  them  in  a  queue  and  service  them 
FCFS  with  the  first  available  vehicle  by  forming  a 
TSP  tour  on  the  set  and  following  it  in  an  arbitrary 
direction.  Optimize  over  n. 


It  is  possible  to  show  [8]  that 
limr_>oo  linig-).!- Tutsp(^’)/?"u  =  1’  hence  the  UTSP 

policy  with  r  -foo  is  an  optimal  spatially-unbiased  policy 
in  heavy  load. 

The  same  paper  [8]  presents  an  optimal  spatially-biased 
policy.  This  policy,  called  Biased  TSP  (BTSP)  Policy,  relies 
on  an  even  finer  partition  of  the  environment,  requires  /  to  be 
piecewise  constant,  and  also  depends  on  a  parameter  n. 

Although  both  the  UTSP  and  the  BTSP  policies  are  op¬ 
timal  within  their  respective  classes,  they  have  two  major 
drawbacks:  First,  in  the  UTSP  policy,  to  ensure  stability, 
n  should  be  chosen  so  that  (see  [8],  page  961)  n  > 


^^/^TSP  2  [/q —  p)^);  therefore,  to 
ensure  stability  over  a  wide  range  of  values  of  g,  the  system 
designer  is  forced  to  select  a  large  value  for  n.  However,  if 
during  the  execution  of  the  policy  the  load  factor  turns  out  to 
be  only  moderate,  demands  have  to  wait  for  an  excessively 
large  set  to  be  formed,  and  the  overall  system  performance 
deteriorates  significantly.  Similar  considerations  hold  for  the 
BTSP  policy.  Hence,  these  two  policies  are  not  adaptive  with 
respect  to  load  conditions.  Second,  both  policies  require  a 
centralized  data  structure  (the  demands’  queue  is  shared  by 
the  vehicles);  hence,  both  policies  are  centralized. 


C.  Objective  of  the  Paper 

In  many  applications  it  is  desirable  to  provide  spatially- 
unbiased  service,  i.e.,  the  same  quality  of  service  to  all  of  the 
outstanding  demands  independently  of  their  locations;  e.g.,  in 
the  UAV  example  discussed  in  the  introduction,  outstanding 
“threats”  at  the  periphery  of  Q  should  receive  the  same  quality 
of  service  as  the  outstanding  “threats”  in  the  middle  of  Q. 
However,  in  light  load,  when  most  of  the  time  the  vehicles 
are  idling,  it  is  more  reasonable  to  consider  spatially-biased 
policies  that  aim  at  positioning  the  vehicles  in  the  best  a  priori 
locations,  i.e.,  at  the  locations  that  minimize  the  expected 
distance  to  the  next  demand. 

Accordingly,  in  this  paper  we  focus  on  policies  for  the 
m-DTRP  that  are  constrained  to  provide  spatially-unbiased 
service  in  heavy  load,  but  can  provide  spatially-biased  service 
in  light  load.  With  a  slight  abuse,  we  refer  to  policies  satisfying 
this  constraint  as  unbiased  policies.  The  objective  is  to  devise 
adaptive,  distributed,  scalable  unbiased  policies  with  provable 
performance  guarantees,  that  rely  only  on  local  exchange  of 
information  between  neighboring  vehicles,  and  do  not  require 
explicit  knowledge  of  the  global  structure  of  the  network. 

The  key  idea  that  we  will  pursue  in  this  paper  is  that  of 
partitioning  policies.  Given  a  policy  tt  for  the  1-DTRP  and  m 
vehicles,  a  7r-partitioning  policy  is  a  multi-vehicle  policy  such 
that  1)  the  environment  Q  is  partitioned  according  to  some 
partitioning  scheme  into  m  openly  disjoint  subregions  Qk, 
k  G  {1, . . . ,  m},  whose  union  is  Q,  2)  one  vehicle  is  assigned 
to  each  subregion  (thus,  there  is  a  one-to-one  correspondence 
between  vehicles  and  subregions),  and  3)  each  vehicle  executes 
the  single-vehicle  policy  tt  to  service  demands  that  fall  within 
its  own  subregion.  The  SQM  policy,  which  is  optimal  in  light 
load,  is  indeed  a  partitioning  policy  whereby  Q  is  partitioned 
according  to  a  median  Voronoi  diagram  that  “minimizes”  H^, 
and  each  vehicle  executes  inside  its  own  Voronoi  cell  the 
single-vehicle  policy  “service  FCFS  and  return  to  the  assigned 
median  after  each  service  completion”.  Moreover,  in  Section 
VI,  we  will  prove  that  in  heavy  load,  given  a  single-vehicle, 
optimal  unbiased  policy  tt*  and  m  vehicles,  there  exists  an 
unbiased  tt* -partitioning  policy  that  is  optimal. 

The  above  discussion  leads  to  the  following  strategy:  First, 
for  the  1-DTRP,  we  design  unbiased  and  adaptive  control 
policies  with  provable  performance  guarantees.  Then,  by  using 
recently  developed  distributed  algorithms  for  environment  par¬ 
titioning,  we  extend  these  single-vehicle  policies  to  distributed 
multi-vehicle  policies. 

IV.  The  Single-Vehicle  Divide  &  Conquer  Policy 

We  describe  the  Divide  &  Conquer  (DC)  policy  in  Algo¬ 
rithm  1,  where  D  is  the  set  of  outstanding  demands  waiting 
for  service  and  p  is  the  current  position  of  the  vehicle.  An 
r-partition  that  is  simultaneously  equitable  with  respect  to  / 
and  is  guaranteed  to  exist  (see  discussion  on  equitable 
partitions  in  Section  II).  The  number  of  subregions  r  is  a 
design  parameter  whose  choice  will  be  discussed  after  the 
analysis  of  the  policy.  One  should  note  that  P*  (which  is 
defined  in  Algorithm  1)  is  indeed  the  empirical  median.  Next, 
we  characterize  the  DC  policy.  One  can  easily  show  that  the 
DC  policy  is  unbiased. 
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Algorithm  1:  Divide  &  Conquer  (DC)  Policy 

Assumes:  An  r-partition  of  Q  that  is  simultaneously 

equitable  with  respect  to  /  and 

1  if  D  =  0  then 

2  Let  Pi  be  the  point  minimizing  the  sum  of  distances  to 
demands  serviced  in  the  past  (the  initial  condition  for  Pi  is 
a  random  point  in  Q);  if  p  7^  P* ,  move  to  Pi ,  otherwise 
stop. 

3  else 

4  k  ko  (ko  is  the  subregion  with  the  largest  number  of 
outstanding  demands). 

5  repeat 

6  Move  to  the  median  of  subregion  Qk. 

7  Compute  a  TSP  tour  through  all  demands  in  subregion 
Qk-  Service  all  demands  by  following  the  TSP  tour, 
starting  at  a  random  demand  on  the  TSP  tour. 

8  fc  fc  +  1  modulo  r. 

9  until  D  —  % 

10  Repeat. 


A.  Analysis  of  the  DC  Policy  in  Light  Load 


We  first  study  the  light-load  case  (i.e.,  g  — >  0+). 

Theorem  4.1  (Performance  of  DC  policy  in  light  load): 

As  g  —>■  0+,  for  every  r,  the  DC  policy  is  asymptotically 
optimal,  that  is,  Tucir)  —>7’*,  as  p  — )■  0+. 

Proof:  Using  arguments  similar  to  those  in  [21],  consider 
generic  initial  conditions  in  Q  for  the  vehicle’s  position 
and  for  the  positions  of  the  outstanding  demands  (denote 
the  initial  number  of  demands  with  no).  Under  the  DC 
policy,  an  upper  bound  on  the  time  Cq  needed  to  service 
all  of  the  initial  demands  and  move  to  the  empirical  me¬ 
dian  Pi  is:  Co  <  (no  -I-  l)diam(Q)/v  +  Thus, 

we  have  E  [Cq]  <  E  [no](diam(Q)/n  -b  s)  -b  diam(Q)/n. 
Given  the  time  interval  Cq  =  t,  t  G  K>o,  the  proba¬ 
bility  that  no  new  demand  arrives  during  such  time  inter¬ 
val  is  P  (fV(Co)  =  0|Co  =  f)  =  exp(— At),  where  N{t)  is 
the  counting  variable  associated  to  a  Poisson  process  with 
rate  A.  Hence,  by  applying  the  law  of  total  probability 
and  Jensen’s  inequality  for  convex  functions  one  obtains: 
P[iV(Co)=0]  =  /“P(iV(Co)=0|Co  =  f)/co(f)df  = 
/“exp(-Af)/co(f)df  >  exp(^-A/“f/co(f)df)  = 

exp(-AE  [Co])  >  exp(-AE  [no]  ( +  s)  - 
where  fc^  is  the  density  function  of  random  variable  Cq. 
Since,  by  assumption,  E  [no]  <  00  and  p  — )■  O’*"  (i.e., 
A  — 1-  0+),  we  obtain  P[At(Co)  =  0]  — >  1“  as  p  — )■  O'*".  As 
a  consequence,  after  an  initial  transient,  all  demands  will  be 
generated  with  the  vehicle  at  the  empirical  median  P*  and 
with  an  empty  demand  queue,  with  probability  1.  Then,  in  light 
load,  the  DC  policy  becomes  identical  to  the  light-load  routing 
policy  presented  in  [22];  since  such  routing  policy  is  optimal, 
in  light  load,  for  the  1-DTRP  [22],  we  obtain  the  desired  result 
(the  proof  in  [22]  basically  shows  that  P^  — >  Pi{Q)).  ■ 


B.  Analysis  of  the  DC  Policy  in  Heavy  Load 

Next,  we  consider  the  DC  policy  in  heavy  load,  i.e.,  when 
p  — >  1“.  The  performance  of  the  DC  policy  in  heavy  load  is 
characterized  by  the  following  theorem. 

Theorem  4.2  (Performance  of  DC  policy  in  heavy  load): 

As  p  — >  1~,  the  system  time  for  the  DC  policy  satishes 


Tr,c{r)  <  (1  +  ^) 


A 

Ptsp,2 


dx 


(1  -  Qf 


(6) 


where  r  is  the  number  of  subregions. 

The  proof  of  Theorem  4.2  relies  on  techniques  similar  to 
those  we  developed  in  [23],  [24],  and  builds  on  a  number 
of  intermediate  results.  We  start  with  the  following  lemma, 
similar  to  Lemma  1  in  [21],  characterizing  the  number  of 
outstanding  demands  in  heavy  load. 

Lemma  4.3  (Number  of  demands  in  heavy  load):  In  heavy 
load  (i.e.,  p  — >  1“),  after  a  transient,  the  number  of  demands 
serviced  in  a  single  tour  of  the  vehicle  in  the  DC  policy  is 
very  large  with  high  probability  (i.e.,  the  number  of  demands 
tends  to  -boo  with  probability  that  tends  to  1,  as  p  approaches 
1-). 

Proof:  Consider  the  case  where  the  vehicle  moves  with 
inhnite  velocity  (i.e.,  v  — >  -boo);  then  the  system  is  reduced  to 
an  M/G/1  queue  (i.e.,  a  queue  with  exponentially  distributed 
inter-arrival  times,  generally  distributed  service  times,  and  a 
single  server).  The  inhnite-velocity  system  has  fewer  demands 
waiting  in  queue.  A  known  result  on  M/G/1  queues  [25] 
states  that,  after  transients,  the  total  number  of  demands,  as 
p  — >  1“,  is  very  large  with  high  probability.  Due  to  the  way 
Q  is  partitioned,  the  number  of  outstanding  demands  in  each 
subregion  is  also  very  large  with  high  probability.  In  particular, 
after  transients,  the  number  of  demands  is  very  large  with  high 
probability  at  the  time  instants  when  the  vehicle  starts  a  new 
TSP  tour.  ■ 

Lemma  4.3  and  its  proof  imply  that  in  heavy  load  U  7^  0 
with  high  probability.  Accordingly,  in  the  following  heavy¬ 
load  analysis  we  consider  the  condition  D  =  %  always  false, 
and  the  DC  policy  then  reduces  to  lines  5-9  of  Algorithm  1. 

We  refer  to  the  time  instant  /  >  0,  in  which  the  vehicle 
starts  a  new  TSP  tour  in  subregion  1  as  the  epoch  i  of  the 
policy;  we  refer  to  the  time  interval  between  epoch  i  and 
epoch  /  -b  1  as  the  zth  iteration.  Let  n^,  k  G  {1, . . . ,  r},  be  the 
number  of  outstanding  demands  serviced  in  subregion  k  during 
iteration  i.  Finally,  let  Cf  be  the  time  interval  between  the 
time  instant  the  vehicle  starts  to  service  demands  in  subregion 
k  during  iteration  i  and  the  time  instant  the  vehicle  starts  to 
service  demands  in  the  same  subregion  k  during  next  iteration 
f-b  1.  Demands  arrive  in  subregion  Qk  according  to  a  Poisson 
process  with  rate  A  =  A/r,  where  we  use  the  fact  that  the 
partition  {Qk}k=i  is  equitable  with  respect  to  /;  then,  we 
have  E  [nji|_]^]  =  AE  [Cf].  The  time  interval  is  the  sum 
of  three  components:  1)  the  time  for  the  r  travels  from  the  end 
of  one  TSP  tour  to  the  start  of  next  one;  2)  the  time  to  perform 
r  TSP  tours;  and  3)  the  time  to  provide  on-site  service  while 
performing  the  r  TSP  tours. 

The  hrst  component  is  trivially  upper  bounded  by 
2diam(Q)/v  •  r.  Given  that  a  demand  falls  in  subregion  Qk, 


7 


the  conditional  density  for  its  location  (whose  support  is  Qk) 
is  f{x)/(^ f{x)  dx^.  By  equation  (1),  we  have  that  the  ex¬ 
pected  length  of  a  TSP  tour  through  n  demands  independently 
distributed  according  to  the  density  f{x)/(^  f{x)  dx^  sat¬ 
isfies  (with  a  slight  abuse  of  notation,  we  call  such  length 
TSP(n)) 


lim 


E[TSP(n)] 


fix) 


J  fix)  dx 


dx 


=  /5tSP,2  — 


lQf^^^ix)dx  . 


(7) 


=  /3, 


where  we  have  exploited  the  fact  that,  by  definition  of  the  DC 
policy,  the  partition  {Qk}k=i  simultaneously  equitable  with 
respect  to  /  and  Pick  now  an  arbitrarily  small  (  >  0.  By 
equation  (7),  there  exists  an  n  e  N>o  such  that  for  all  n  >  n 
it  holds  E  [TSP(n)]  <  (/3  -f  C)  Then,  the  expected  length 
of  a  TSP  tour  through  demands  (with  a  slight  abuse  of 
notation,  we  call  such  length  TSP(nf^))  can  be  upper  bounded 
as: 


+00 


E  [TSP(nf)]  =  ^E  [TSP(n,^)|n,^  =  n]P  [n'y=n] 

n—0 

+00  h 

<  ^  (/3 -f  O's/ttP  =''^] ^diam(Q)P  [nf  =  n] 


n—h-\-l 

<  (/3  +  C)E 


n—0 


-I-  ndiam(Q)P  n'D  <  ri 


<ij3  +  C)\J^n^  -I- ndiam(Q)P  [nf  <  n] , 


where  in  the  first  equality  we  have  used  the  law  of  total 
expectation,  and  in  the  last  inequality  we  have  applied  Jensen’s 


inequality  for  concave  functions  in  the  form  E 


< 


■\/E[X].  By  Lemma  4.3,  there  exists  g  G  (0,  1)  such  that 
for  all  g  G  [p,  1)  there  exists  an  integer  {(g)  (possibly 
dependent  on  g)  such  that  P  [n*  <  n]  <  1/n.  Consider  any 
g  G  [p,  1)  and  assume  that  i  >  i(g),  then  E  [TSP(n^)]  < 

(/3  -I-  C)  y^E  [n'D]  +  diam(Q),  for  i  >  {(g)  and  p  e  [p,  1). 

Finally,  since  on-site  service  times  are  independent  of  the 
number  of  outstanding  demands,  the  expected  on-site  service 
requirement  for  demands  is  s  E  [nf  ] . 

Then,  we  obtain  the  following  recurrence  relation  (where 
we  define  =  E  [nf ] ): 


nf+i  =  AE  [Cf]  <  A^diam(Q)^  + 

^  j=k 

(8) 

where  k  G  {1, . . .  ,r},  i  >  i{g),  and  p  e  [p,  1). 

The  r  inequalities  above  describe  a  system  of  recur¬ 
rence  relations  that  allows  us  to  find  an  upper  bound  on 
lim  supj_j.^go  nf.  The  following  lemma  bounds  the  values  to 
which  the  limits  lim  supj_j._|_go  converge. 

Lemma  4.4  (Bound  on  steady-state  number  of  demands): 

In  heavy  load,  for  every  set  of  initial  conditions 


{ni}fce{i,...,r}.  the  trajectories  i  !->■  nf,  k  G  r}, 

satisfy 


=  lim  sup  <  /3tsp,2' 

z— >-+oo 


IgP^^ix)dx 


l2 


(l-p)2 


Proof:  Define  the  constants:  jd  =  j3  C,  and  a  = 
diam(Q)  3  r/v.  Henceforth  we  assume  that  p  €  [p,  1)  and  that 
*  >  iio),  we  simply  denote  7(p)  as  i.  Next  we  define  two 
auxiliary  systems,  System-Y  and  System-Z,  whose  trajectories 
will  be  used  to  bound  the  trajectories  i  i-G  n^.  We  define 
System-Y  (with  state  yii)  G  M’’)  as 

Vkii  +  l)  =Xa  +  x(  '^[syjii)  +  ^  ^yj{i)^ 

\  j^k 

k—1  5  _ 

+ X!  u 

7  =  1 

(9) 


^  ,  for  7  >  z, 


with  yk{i)  =  nf,  and  where  k  G  r}.  System-Y  is 

obtained  by  replacing  the  inequality  in  equation  (8)  with  an 
equality.  Pick,  now,  any  e  >  0.  From  Young’s  inequality 
\/w  <  1/(4£)  -f  ew,  for  all  w  G  M>o-  By  applying  Young’s 
inequality  in  equation  (9)  we  obtain 


Vkii  +  O  <  aI  s-f  j  (  XI 2^7 (*)  +  512^1*^*  +  ) 

V  /  \j=k  7  =  1  /  (10) 


fe-1 


+  A 


(  rB 

- f  a 

\4z;£ 


k  G  {l,...,r}. 


Next,  define  System-Z  (with  state  z(z)  G  M’’)  as 


Zk(i-l-l)  =  a|  s-f  £^  1  IJ^z^(i)  +  J^Zj(i-l-l)  j 

\  )  \j^k  j^l  ) 


fc-1 


+  A 


f  fA 

I  4v£ 


for  i  >  i, 


with  Zfc(z)  =  nf,  and  where  k  G  {1, . . .  ,r}.  The  proof  now 
proceeds  as  follows.  First,  we  show  that  the  condition  nf  = 
ykii)  =  ZkH)  for  all  k  implies  that 


A  <  Vkii)  <  Zkii),  for  all  k  and  for  all  i>i.  (12) 


Second,  we  show  that  the  trajectories  of  System-Z  are 
bounded;  this  fact,  together  with  equation  (12),  implies  that 
also  trajectories  of  variables  nf  and  yk  (i)  are  bounded.  Third, 
and  last,  we  will  compute  limsupj_j._|_go  z//c(z);  this  quantity, 
together  with  equation  (12),  will  yield  the  desired  result. 

We  prove  the  first  fact  by  induction  on  z,  i.e.,  we  prove  that 
if  <  yk(i)  <  Zk(i)  for  all  k  G  r},  then  < 

yk(t  +  1)  <  Zk(i  +  1)  for  all  k  G  {1, . . .  ,r},  where  i  >  i. 
Indeed,  it  is  immediate  to  show  that,  under  the  assumptions, 
it  holds:  n\j^^  <  2/i(7  +  1)  <  zi(i  -f  1);  moreover,  it  is  easy  to 
show  that,  under  the  assumptions,  and  if  nj+i  <  yjii  +  1)  < 
Zj{i  4-  1)  for  all  j  <  k  <  r,  then  A+i  ^  yfc-i-i(*  +  1)  < 
Zk+iii  -f  1).  The  inductive  step  then  follows  immediately. 
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We  now  turn  our  attention  to  the  second  issue,  namely 
boundedness  of  trajectories  for  System-Z  (described  in  equa¬ 
tion  (11)).  Define  6  =  X{s  +  e/3/v).  System-Z  is  a  discrete¬ 
time  system  and  can  be  rewritten  (by  adding  and  subtracting 
Zk-i{i)  in  the  second  term  between  parentheses  in  equation 
(1 1),  when  fc  >  1)  as 


Finally,  we  turn  our  attention  to  the  third  issue,  namely 
the  computation  of  the  limit  y  =  limsupj_j.^go  y(z).  Taking 
the  limsup  of  the  left-  and  right-hand  sides  of  equation  (9), 
and  noting  that,  for  j  G  {1,  ■  ■  ■  ,r},  limsupj_j.+3o  \/yjU)  = 
.^lim  supj_j._|_go  yj  (i),  since  is  continuous  and  strictly 
monotone  increasing  on  M>o,  we  obtain  that 


..  ..  +  forA:  =  l, 

Zk{l+l)={  /  N,  ^  ^ 

^l  +  (5jzfc_i(i-fl)-52;fc-i(0.  for  fce{2,...,r}. 

By  using  the  above  equation  and  by  simple  inductive  argu¬ 
ments,  it  is  possible  to  show  that  System-Z  can  be  written  in 
canonical  form  as 

+  1)  =  (1  +  i))*  ^  ^  ^ 

k-l 

“X!  <^(1 +  A:  e  r}. 

i=i 


Hence,  System-Z  can  be  written  in  compact  form  as 
z{i  -I-  1)  =  +  B{e)^z{i)  +  c(e),  where  A  is  a  matrix  in 

and  is  independent  of  e,  B{e)  G  and  each  element 
of  B{e),  say  b{e)kj,  has  the  property  limg_j.o+  b{e)kj  =  0  for 
all  fc,  j  G  r},  and  c{e)  G  (notice  that  c{s)  is  well 

defined  for  every  £  >  0).  It  is  easy  to  see  that  the  entries  of 
matrix  A  are 


^kj  — 


(1  +  g) 

g(l  +  g) 


k-l 


k-l 


(1  -f  gf-^-^ 


for  1  <  j  <  fc  —  1, 
otherwise, 


where  g  =  As.  Notice  that,  since  p  >  0,  we  have  \akj  \  =  a^j 
for  all  k  and  j.  Then,  for  each  fc  G  r},  we  have 


k-l 


i=i  i=i  i=i 

=  (1  +  g)''~^  {rg-  1)+1  <  1, 


<0 


since  g  =  Xs/r  and  p  =  As  <  1  by  assumption.  There¬ 
fore,  the  L°°-induced  norm  of  matrix  A  satisfies:  ||Al||oo  = 
maxfc  \akj  \  <  1.  Since  any  induced  norm  ||  •  ||  satisfies 
the  inequality  p{A)  <  |iAl||,  where  p{A)  is  the  spectral  radius 
of  A,  we  conclude  that  A  G  has  eigenvalues  strictly 

inside  the  unit  disk  (i.e.,  Gl  is  a  stable  matrix).  Since  the 
eigenvalues  of  a  matrix  depend  continuously  on  the  matrix 
entries,  there  exists  a  sufficiently  small  £  >  0  such  that  the 
matrix  A  +  B{e)  has  eigenvalues  strictly  inside  the  unit  disk. 
Accordingly,  having  selected  a  sufficiently  small  £,  the  solution 
i  ^  z(i)  G  of  System-Z  converges  exponentially  fast  to 

the  unique  equilibrium  point  z*{e)  =  (^Ir  —  A  —  B{e^  c{e), 
where  Ir  is  the  identity  matrix  of  size  r.  Combining  equa¬ 
tion  (12)  with  the  previous  statement,  we  see  that  the  solutions 
i  !-)■  n{i)  (where  n{i)  =  (nj,...,n^))  and  i  i->  y{i)  are 
bounded.  Thus 


limsupn(i)  <  limsupt/(i)  <  -foo.  (13) 

2— >-  +  00  Z— >-  +  00 


Vk 


A  d  "4“  A  syj  + 

i=i 


(14) 


therefore  we  have  yk  =  yj  for  all  k,j  G  r}. 

Substituting  in  equation  (14)  and  solving  for  we  obtain 

Vk  =  \  +  \l v^i-eY  +  f$)  ■  Recalling  that  this 

analysis  holds  for  every  p  G  [p,  1),  by  taking  p  arbitrarily 
close  to  1  we  obtain 


A2/32 


(15) 


(Formally,  we  should  have  written  lim^_,.i-  j/fc  (1  —  p)^  = 
X^fP /v^,  however,  as  discussed  in  Remark  3.2,  we  prefer  to 
adhere  to  the  less  formal  but  more  transparent  style  of  equation 
(15)).  Noting  that  from  equation  (13)  limsupj_,._i_go  <  Vk, 

and  recalling  that  /3  =  ^  where  C,  is  an 

arbitrarily  small  constant,  we  obtain  the  desired  result.  ■ 
We  are  now  in  a  position  to  prove  Theorem  4.2 

Proof  of  Theorem  4.2:  Define  (7*  =  limsupj_,.go  E  [C^^] ; 
then  we  have,  by  using  the  upper  bound  on  E  [Cf  ]  in  equation 
(8)  and  neglecting  constant  terms. 


=  limsupE  [Cf] 

z— >-oo 


Q  ^  ^  _ 

<  —  ^  +  s  ^  P 

^  i=i  i=i 

r  XP  f3  r  gXP"^ 

“  (1  —  p)  (1  —  p)2 


Hence,  in  the  limit  p  - 
r  XP^ 


< 


1  ,  we  have 


/7j2 


(1  —  p)2 


(1  —  p)2 


Consider  a  random  demand  that  arrives  in  subregion  k.  Its 
expected  steady-state  system  time,  T  ,  will  be  upper  bounded, 
as  p  ^  1-,  by  <  i  <  ^C'^  + 

where  we  used  the  fact  that,  as  p  — >  1“,  the  travel  time  along 
a  TSP  tour  is  negligible  compared  to  the  on-site  service  time 
requirement.  Unconditioning,  we  obtain  the  claim.  ■ 


C.  Discussion 

The  DC  policy  is  optimal  in  light  load  (Theorem  4.1); 
moreover,  if  we  let  r  — >  oo,  the  DC  policy  achieves  the  heavy¬ 
load  lower  bound  (4)  for  unbiased  policies  (Theorem  4.2). 
Therefore  the  DC  policy  is  both  optimal  in  light  load  and 
arbitrarily  close  to  optimality  in  heavy  load,  and  stabilizes  the 
system  in  every  load  condition  (since  it  stabilizes  the  system 
in  the  limit  p  — >  1“).  Notice  that  with  r  =  10  the  DC  policy 
is  already  guaranteed  to  be  within  10%  of  the  optimal  (for 
unbiased  policies)  performance  in  heavy  load. 
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The  implementation  of  the  DC  policy  does  not  require  the 
knowledge  of  A,  s,  and  w;  however,  if  one  chooses  r  >  1, 
the  computation  of  a  simultaneously  equitable  r-partition  of 
Q  requires  the  knowledge  of  /.  Hence,  if  r  — >  +oo,  the  policy 
is  (i)  provably  optimal  in  both  light  and  heavy  load,  and  (ii) 
adaptive  with  respect  to  arrival  rate  A,  expected  on-site  service 
s,  and  vehicle’s  velocity  v.  If,  instead,  r  =  1,  the  policy  is 
(i)  provably  optimal  in  light  load  and  within  a  factor  2  of 
the  optimal  performance  in  heavy  load,  and  (ii)  adaptive  with 
respect  to  arrival  rate  A,  expected  on-site  service  s,  vehicle’s 
velocity  v,  and  spatial  density  /;  in  other  words,  when  r  =  1, 
the  DC  policy  adapts  to  all  problem  data.  (These  results  should 
be  compared  with  the  discussion  in  Section  III  about  the  non- 
adaptive  nature  of  the  UTSP  policy,  which,  for  example,  might 
become  unstable  if  A  increases.) 

An  approximate  algorithm  to  compute,  with  arbitrarily 
small  error,  an  r-partition  that  is  simultaneously  equitable  with 
respect  to  two  measures  can  be  found  in  [13]  (specifically, 
see  discussion  on  page  621);  in  the  particular  case  when  the 
density  /  is  uniform,  the  problem  of  finding  an  r-partition 
that  is  simultaneously  equitable  with  respect  to  /  and 
becomes  trivial.  Given  a  simultaneously  equitable  r-partition 
of  Q,  the  subregions  can  be  indexed  using  the  following  pro¬ 
cedure:  (i)  a  TSP  tour  through  the  medians  of  the  subregions 
is  computed,  (ii)  subregions  are  indexed  according  to  the  order 
induced  by  this  TSP  tour.  In  practice,  the  DC  policy  would 
be  implemented  by  allowing  the  vehicle  to  skip  subregions 
with  no  demands.  Note  that  if  we  assume  that  /  is  known  (as 
it  must  be  the  case  for  the  implementation  of  the  DC  policy 
with  r  >  1),  then  instead  of  using  the  empirical  median 
one  should  directly  use  the  exact  median  (which  is  the 
solution  of  a  strictly  convex  problem). 

The  DC  policy  requires  on-line  solutions  of  possibly  large 
TSPs;  therefore,  practical  implementations  of  this  policy 
should  rely  on  heuristics  or  on  polynomial-time  approximation 
schemes  for  the  solution  of  TSPs,  such  as  Lin-Kernighan’s 
or  Christofides’  (see  Section  II-B).  We  will  further  discuss 
computation  times  in  Section  VIII. 

Finally,  the  DC  policy  with  r  =  1  is  similar  to  the  generation 
policy  presented  in  [21];  in  particular,  the  generation  policy 
has  the  same  performance  guarantees  of  the  DC  policy  with 
r  =  1.  However,  the  generation  policy  is  analyzed  only  for  the 
case  of  uniform  spatial  density  /,  and  its  implementation  does 
require  the  knowledge  of  /  (while,  as  discussed  before,  the 
DC  policy  with  r  —  1  adapts  to  all  problem  data,  including 
/).  The  part  of  Algorithm  1  in  lines  5-9  is  similar  to  the 
PART-TSP  policy  presented  in  [20];  however,  the  PART-TSP 
policy  is  only  informally  analyzed  by  assuming  steady-state 
conditions,  in  particular  no  proof  of  stability  and  convergence 
is  provided. 

In  the  next  section  we  present  and  analyze  another  single¬ 
vehicle  policy,  the  Receding  Horizon  (RH)  policy,  that  applies 
ideas  of  receding-horizon  control  to  dynamic  vehicle  routing 
problems. 

V.  The  Single-Vehicle  Receding  Horizon  Policy 

We  describe  the  Receding  Horizon  (RH)  policy  in  Algo¬ 
rithm  2,  where  D  is  the  set  of  outstanding  demands  waiting  for 


service  and  p  is  the  current  position  of  the  vehicle;  moreover, 
given  a  tour  T  of  79,  a  fragment  of  T  is  a  connected  subgraph 
of  T.  If  79  7^  0,  the  vehicle  selects  a  random  ry-fragment 


Algorithm  2;  Receding  Horizon  (RH)  Policy 
input:  Scalar  rj  £  (0,  1) 

1  if  79  =  0  then 

2  Let  Pi  be  the  point  minimizing  the  sum  of  distances  to 
demands  serviced  in  the  past  (the  initial  condition  for  is 
a  random  point  in  Q);  if  p  7^  Pf  move  to  Pf  otherwise 
stop. 

3  else 

4  repeat 

5  Compute  the  TSP  tour  through  all  demands  in  79. 

6  Uniformly  randomly  choose,  independently  from  the 
past,  a  fragment  of  length  p  TSP(79)  of  such  TSP 
tour. 

7  Service  the  selected  fragment  starting  from  the 
endpoint  closest  to  the  current  position. 

8  until  79  =  0 

9  Repeat. 


of  the  TSP  tour  through  the  demands  in  79  (i.e.,  a  fragment 
of  length  7]  TSP (79)  of  such  tour);  note  that  the  horizon  is 
not  fixed,  but  it  is  adjusted  according  to  the  cost  of  servicing 
the  outstanding  demands.  In  general,  the  performance  of  the 
system  will  depend  on  the  choice  of  the  parameter  p,  which 
will  be  discussed  after  the  analysis  of  the  policy.  One  can 
easily  show  that  the  RH  policy  is  unbiased. 

A.  Stability  and  Performance  of  the  RH  Policy 

The  RH  policy  is  attractive  since  it  can  be  implemented 
without  any  knowledge  of  the  underlying  demand  generation 
process  (in  particular,  without  any  knowledge  of  /).  In  this 
section,  we  study  the  stability  of  the  RH  policy  and  we  discuss 
its  performance. 

1)  Stability  of  the  RH  Policy:  In  general,  the  stability 
properties  of  receding  horizon  controllers  deteriorate  as  the 
horizon  becomes  shorter;  remarkably,  the  RH  policy  is  stable 
in  every  load  for  every  0  <  p  <  1. 

Theorem  5.1  (Stability  of  RH  policy):  As  g  —>■  1~,  the  sys¬ 
tem  time  for  the  RH  policy  satisfies 

A/3S2I2I 

TMv)  <  ^2(r-g)2’  ^  ^  1)’ 

where  |Q|  is  the  area  of  Q  and  /?q  2  is  the  constant  appearing 
in  equation  (2). 

Proof:  By  following  the  same  arguments  as  in  Lemma 
4.3,  one  can  show  that,  in  heavy  load,  D  f  %  with  high 
probability.  Accordingly,  in  the  following  heavy-load  analysis 
we  consider  the  condition  79  =  0  always  false,  and  the  RH 
policy  then  reduces  to  lines  4-8  of  Algorithm  2. 

We  refer  to  the  time  instant  ti,  i  >  0,  in  which  the  vehicle 
computes  a  new  TSP  tour  as  the  epoch  i  of  the  policy;  we 
refer  to  the  time  interval  between  epoch  i  and  epoch  7  +  1  as 
the  7th  iteration  and  we  refer  to  its  length  as  the  cost  Ci.  Let 
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Hi  be  the  number  of  outstanding  demands  at  epoch  i;  with 
a  slight  abuse  of  notation,  we  denote  by  TSP(ni)  the  length 
of  the  TSP  tour  through  the  outstanding  demands  at  epoch  i. 
Since  an  77-fragment  is  randomly  chosen,  the  expected  number 
of  demands  left  unserviced  during  iteration  i  is  (1  —  77)  E  [n^]; 
then. 


E[nj+i]=  A  £[(7*]  -I-  (l-?7)E[ni] 

newly  arrived  demands  demands  left  unserviced  during  iteration  i 

The  expected  number  of  demands  that  receive  service  during 
iteration  i  is  r/E  [n^].  Moreover,  the  time  to  reach  the  starting 
endpoint  of  the  selected  fragment  is  bounded  by  diam(Q)/7;. 
Then,  the  expected  time  length  of  iteration  i  can  be  upper 
bounded  as 

E  [Ci]  <  diam(Q)/v  +  -  E  [TSP(r7i)]  +  spE  [rii] 

<  diam(Q)/v  +  -  Pq,2\AQ\  K]  +  S77E  [m], 

(16) 


where  we  have  applied  the  deterministic  bound  in  equation  (2) 
and  Jensen’s  inequality  for  concave  functions.  Therefore,  we 
obtain 

E  [ui+i]  <  Adiam(Q)/v  +  A  ^  Pq,2\/\Q\  VE  [rii] 

+  gr]E[ni]  +  (1  -  'n)E[ni]. 

By  using  the  same  techniques  as  in  the  proof  of  Lemma  4.4 
(in  particular,  the  eigenvalue  of  the  “bounding”  discrete-time 
linear  system  is  /r  =  1  —  77  (1  —  p),  whose  magnitude  is  strictly 
less  than  one  for  all  77  €  (0,  1)),  it  is  possible  to  show  that 
lim  supj_^gQ  E  [rii]  =  n  can  be  upper  bounded  as 

-  ^  i/A/3q,2^  /a^/3|  ,|Q|  4Adiam(g)y 

7;(1  — p)  \J  (1  —  g)^  7777(1  — p)  y  ’ 

for  all  77  €  (0,  1).  Define  C  =  lim supj_j._|_go  E  [Ci].  Consider 
a  random  demand  in  steady  state;  the  expected  number  of 
iterations  before  such  demand  is  scheduled  for  service  is  given 
by  PV{^~'nY ■  Hence,  the  expected  steady-state  system 

time  of  such  demand,  T,  will  be  upper  bounded  for  all  77  G 
(0,  1)  and  as  p  — >  1“  according  to: 


—  1  -  1  - 
T  <  +  -srjn  +  C^  ^77(1  -  77)^  < 

P—0 


A/3|,2lS| 

772  (1  -p)2’ 


where  we  used  the  fact  that,  as  p  — >  1“,  the  travel  time  along 
an  77-fragment  is  negligible  compared  to  the  on-site  service 
time  requirement.  This  concludes  the  proof. 


Since,  by  Theorem  5.1,  the  RH  policy  stabilizes  the  system 
in  the  limit  p  — >  1”,  it  also  stabilizes  the  system  for  every 
load  factor  p. 

2 )  Performance  of  the  RH  Policy:  In  light  load,  the  RH 
policy  becomes  identical  to  the  DC  policy,  and  therefore  it  is 
optimal  (see  Theorem  4.1).  For  the  case  p  — )■  1”,  because 
of  the  dependencies  that  arise  among  demands’  locations, 
we  were  unable  to  obtain  rigorous  upper  bounds  on  the 
system  time  that  are  well  matched  by  numerical  experiments. 
However,  we  now  present  a  heuristic  analysis  of  the  RH  policy 


that  provides  interesting  insights  on  its  behavior  and  suggests 
bounds  on  its  heavy-load  system  time  that  are  well  matched 
by  simulation  results. 

The  service  of  an  77-fragment  introduces  dependencies 
among  the  locations  of  the  demands  that  are  left  unserviced; 
therefore,  even  though  the  number  of  demands,  as  p  — >  1~, 
becomes  large  with  probability  1,  the  result  in  equation  (1) 
(that  requires  a  set  of  independently  distributed  demands) 
formally  does  not  hold.  However,  due  to  the  randomized 
selection  of  the  77-fragments,  it  is  quite  natural  to  conjecture 
that  equation  (1)  still  provides  a  good  estimate  on  the  average 
lengths  of  the  TSP  tours  that  are  computed.  The  rationale 
behind  this  conjecture  is  the  following:  When  77  is  close  to 
1,  most  of  the  outstanding  demands  are  serviced  during  one 
iteration,  and  the  next  TSP  tour  is  mostly  through  newly 
arrived  demands,  that  are  independent  by  the  assumptions 
on  the  demand  generation  process;  when,  instead,  77  is  close 
to  zero,  the  randomized  selection  of  short  fragments  only 
introduce  “negligible”  correlations. 

The  above  observations  motivate  us  to  use 
/?TSP,2  Jq  dx  (as  it  would  be  dictated  by  equation 

(1))  in  equation  (16),  instead  of  /3q,2\/|  Q\  (as  it  is  dictated 
by  equation  (2)).  Then,  by  repeating  the  same  steps  as  in 
the  proof  of  Theorem  4.2,  Lemma  4.4,  and  Theorem  5.1,  we 
would  obtain  the  following  result  for  all  77  G  (0,  1) 


TRuiv)  <  /3tsp,2 


The  upper  bound  in  equation  (17)  is 
than  the  upper  bound  in  Theorem  5.1 
Jensen’s  inequality  for  concave  functions 

_ r  1 1/2 

V\Q\\lQfY'  ' 


as  p  — )■  1  . 

(17) 

generally  tighter 
(notice  that  by 

lQf^^^ix)dx  < 


I  dx 


=  y/fOj),  but  unlike  the  upper  bound 
in  Theorem  5.1  it  has  not  been  established  formally.  Simula¬ 
tion  results  (see  Section  VIII)  indeed  confirm  the  upper  bound 
in  equation  (17). 

Finally,  note  that  when  77  is  close  to  zero,  the  RH  policy 
is  conceptually  similar  to  the  DC  policy  with  r  — >  00,  so  we 
also  speculate  that  in  heavy  load  the  performance  of  the  RH 
policy  improves  as  the  horizon  becomes  shorter.  Simulation 
results  (see  Section  VIII)  indeed  confirm  this  behavior. 


B.  Discussion 

The  RH  policy  is  optimal  in  light  load  and  stabilizes  the 
system  in  every  load  condition  (these  two  statements  have  been 
established  rigorously).  The  implementation  of  the  RH  policy 
does  not  require  the  knowledge  of  A,  s,  v  and  /.  Therefore,  the 
RH  policy  adapts  to  arrival  rate  A,  expected  on-site  service  s, 
vehicle’s  velocity  v,  and  spatial  density  /;  in  other  words,  the 
RH  policy  adapts  to  all  problem  data. 

In  Section  IV  we  saw  that  also  the  DC  policy  with  r  =  1 
adapts  to  all  problem  data.  Simulation  results  (see  Section 
VIII)  show  that  in  heavy  load  the  RH  policy  with  a  short 
horizon  77  (say,  77  «  0.2)  performs  better  than  the  DC  policy 
with  r  =  1  (the  light-load  behavior  is  clearly  the  same). 
Therefore,  to  date,  the  best  available  policy  for  the  1-DTRP 
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that  is  unbiased  and  adapts  to  all  problem  data  seems  to  be 
the  RH  policy  with  rj  «  0.2. 

It  is  natural  to  wonder  how  the  RH  policy  performs  if 
the  p-fragment  is  not  selected  randomly,  but  it  is  instead 
selected  according  to  the  rule:  Find  the  fragment  of  length 
T]  TSP(I?)  that  maximizes  the  number  of  reached  demands.  In 
other  words,  the  vehicle  now  looks  for  a  maximum-reward  rj- 
fragment.  Clearly,  Theorem  5.1  still  applies.  Simulation  results 
(see  [26])  show  that  this  spatially-biased  variant  of  the  RH 
policy  appears  to  perform  better  in  heavy  load  than  an  optimal 
unbiased  policy  but  not  as  well  as  an  optimal  biased  policy. 

As  for  the  DC  policy,  the  RH  policy  requires  on-line  solu¬ 
tions  of  possibly  large  TSPs;  therefore,  practical  implementa¬ 
tions  of  this  policy  should  rely  on  heuristics  or  on  polynomial¬ 
time  approximation  schemes  for  the  solution  of  TSPs,  such 
as  Lin-Kernighan’s  or  Christofides’.  We  will  further  discuss 
computation  times  in  Section  VIII. 

VI.  Adaptive  and  Distributed  Policies  for  the 
m-DTRP 

In  this  section  we  extend  the  previous  single-vehicle  policies 
to  the  multi-vehicle  case.  First,  we  prove  that  in  heavy  load, 
given  a  single-vehicle,  optimal  unbiased  policy  tt*  and  m 
vehicles,  there  exists  an  unbiased  tt* -partitioning  policy  that 
is  optimal.  This  result  and  the  optimality  of  the  SQM  policy 
(which  is  a  partitioning  policy  that  is  optimal  in  light  load) 
lead  us  to  propose,  for  the  multi-vehicle  case,  partitioning 
policies  that  use  the  DC  policy  or  the  RH  policy  as  single¬ 
vehicle  policies.  Finally,  we  discuss  how  a  partition  can  be 
computed  in  a  distributed  fashion,  so  that  the  proposed  policies 
are  amenable  to  distributed  implementation. 

A.  Optimality  of  Partitioning  Policies  in  Heavy  Load 

The  following  theorem  shows  the  optimality  of  a  specific 
type  of  partitioning  policy. 

Theorem  6.1  (Optimal  partitioning  in  heavy  load):  As 
p  — >  1“,  given  a  single-vehicle,  optimal  unbiased  policy  tt* 
and  m  vehicles,  a  tt* -partitioning  policy  using  an  m-partition 
which  is  simultaneously  equitable  with  respect  to  /  and 
is  an  optimal  unbiased  policy  for  the  m-DTRR 

Proof:  Let  tt*  be  an  optimal  unbiased  policy  (e.g.,  the 
DC  policy  with  r  — )■  oo).  Construct  an  m-partition  {Qk}T=i 
of  Q  that  is  simultaneously  equitable  with  respect  to  /  and 
(such  partition  is  guaranteed  to  exist  as  discussed  in 
Section  II);  assign  one  vehicle  to  each  subregion.  Each  vehicle 
executes  the  single-vehicle  unbiased  policy  tt*  to  service  de¬ 
mands  that  fall  within  its  own  subregion.  The  probability  that  a 
demand  falls  in  subregion  Qk  is  equal  to  f(x)  dx  =  1/m. 
Notice  that  the  arrival  rate  Afc  in  subregion  Qk  is  reduced 
by  a  factor  f{x)dx,  i.e.,  Afc  =  A  f{x)dx  =  A/m; 
therefore,  the  load  factor  in  subregion  Qk  (where  only  one 
vehicle  provides  service)  is  Qk  =  XkS  =  Xs/m  =  p  <  1,  in 
particular  the  necessary  condition  for  stability  is  satisfied  in 
every  subregion.  Finally,  given  that  a  demand  falls  in  subregion 
Qk,  the  conditional  density  for  its  location  (whose  support  is 
Qk)  is  /(2;)/(/q^  f{x)dx^  =  mf{x).  Then,  by  the  law  of 


Thus,  the  bound  in  equation  (4)  is  achieved.  We  finally  show 
that  such  multi-vehicle  policy  is  indeed  unbiased.  In  fact, 
assume  steady-state  conditions,  and  let  X  be  the  location  of  a 
randomly  chosen  demand,  T  be  its  system  time  and  S  be  an 
arbitrary  subset  of  Q;  then,  by  the  law  of  total  expectation 
we  have:  E  [T  |A:  €  5]  =  e|e  [T  |A:  €  5  n  Qfc]  |A:  e  S  . 
Since  tt*  is  unbiased,  it  must  hold  E  [T  |2f  €  5  n  Qk]  = 
E  [T  |X  €  Qfcj;  moreover,  from  the  previous  analysis,  we  have 
E  [T  |X  G  Qk]  =  Tt^*,  which  is  a  constant  independent  of  Qk- 
Hence,  we  finally  obtain  E  [T  |  AT  €  5]  =  T,r*  ■  The  claim  then 
follows  from  the  fact  that  S  was  chosen  arbitrarily.  ■ 

Remark  6.2:  Theorem  6.1  proves  Conjecture  1  in  [12] 
(made  only  for  uniform  spatial  densities). 

Remark  6.3:  It  is  immediate  to  generalize  Theorem  6.1  as 
follows.  Let  TT  be  a  single- vehicle  unbiased  policy  such  that 
in  heavy  load  <  7.  As  p  — )■  1“,  given  such  policy  tt 

and  m  vehicles,  a  7r-partitioning  policy  using  an  m-partition 
which  is  simultaneously  equitable  with  respect  to  /  and  is 
always  within  a  factor  7  of  the  optimal  unbiased  performance. 

The  proof  of  Theorem  6.1  relies  on  m-partitions  that  are 
simultaneously  equitable  with  respect  to  /  and  We  next 
show  in  Theorem  6.4  that  a  tt* -partitioning  policy  using  an 
m-partition  which  is  equitable  with  respect  to  may  be 
unstable;  moreover,  we  show  in  Theorem  6.5  that,  in  general,  a 
7r*-partitioning  policy  using  an  m-partition  which  is  equitable 
with  respect  to  /  does  not  achieve  the  optimal  unbiased 
performance,  however  it  is  always  within  a  factor  m  of  it. 

Theorem  6.4  (Unstable  partitioning  policies):  Given  a 
single-vehicle,  optimal  unbiased  policy  tt*  and  m  vehicles,  a 
TT* -partitioning  policy  using  an  m-partition  which  is  equitable 
with  respect  to  may  be  unstable  for  values  of  p  strictly 
less  than  one. 

Proof:  The  arrival  rate  in  subregion  Qk,  k  G  m}, 

is  Afc  =  A  f(x)  dx,  and  therefore  the  load  factor  in 
subregion  Qk  (where  only  one  vehicle  provides  service)  is 
Qk  =  X  f{x)  dx  s.  In  general,  an  m-partition  that  is  equi¬ 
table  with  respect  to  is  not  equitable  with  respect  to  /. 
If  this  is  the  case,  there  exists  k  G  {1, . . . ,  m}  and  £  >  0  such 
that  J’g_  f{x)dx=  1/m-l-e:;  then,  Qk  =  Q+eXs  =  p(l-|-em). 
Hence,  for  every  value  of  p  such  that  1/(1  +  em)  <  p  <  1, 
such  multi-vehicle  policy  is  unstable  (since  in  subregion  Qk 
one  has  Qk  >  1),  even  though  the  load  factor  p  is  strictly  less 
than  one. 
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Theorem  6.5  (Suboptimal  partitioning  in  heavy  load):  As 
— >  1“,  given  a  single-vehicle,  optimal  unbiased  policy  tt* 
and  m  vehicles,  a  tt* -partitioning  policy  using  an  m-partition 
which  is  equitable  with  respect  to  /  does  not  achieve,  in 
general,  the  optimal  unbiased  performance,  however  it  is 
always  within  a  factor  m  of  it. 

Proof:  Let  tt*  be  an  optimal  unbiased  policy.  The  environ¬ 
ment  Q  is  partitioned  into  m  subregions  Q^,  k  G  {1,  ■  •  ■  ,rn}, 
such  that  f{x)dx  =  1/m  for  all  k,  and  one  vehicle  is 
assigned  to  each  subregion.  Each  vehicle  executes  the  single¬ 
vehicle  policy  TT*  to  service  demands  that  fall  within  its  own 
subregion.  By  arguments  similar  to  those  in  the  proof  of 
Theorem  6.1,  one  can  show  that 


as  Q 


(19) 


Define  the  vector 


We  will  denote  by  ||2;||2  and  ||2;||i  the  2-norm  and  the  1-norm, 
respectively,  of  vector  z.  Notice  that  ||z||i  =  Jq  f^/‘^{x)  dx. 
Since  ||z||i  <  YTn||z||2  for  any  vector  z  e  M™,  equation  (19) 
becomes 


y  _  J_  /^TSP,2 

m  2 


/5tSP,2 


Alklli  ^  1 

(1  —  g)'^  ~  m 
2 


/5fsP,2 


A| 


2  mv^  {1  — 


Igf^^^{x)dx 


-)2  9,2 


=  t; 


Uj 


as  g  - 


B.  Distributed  Policies  for  the  m-DTRP  and  Discussion 

The  optimality  of  the  SQM  policy  and  Theorem  6.1  suggest 
the  following  distributed  multi-vehicle  version  of  the  DC 
policy:  1)  the  vehicles  compute  in  a  distributed  way  an  m- 
median  of  Q  that  induces  a  Voronoi  partition  that  is  equitable 
with  respect  to  /  and  2)  the  vehicles  assign  themselves 
to  the  subregions  (thus,  there  is  a  one-to-one  correspondence 
between  vehicles  and  subregions);  and  3)  each  vehicle  exe¬ 
cutes  the  single-vehicle  DC  policy  to  service  demands  that  fall 
within  its  own  subregion,  by  using  the  median  of  the  subregion 
instead  of  P/.  (In  an  identical  way,  one  can  obtain  a  multi¬ 
vehicle  version  of  the  RH  policy.) 

If  there  exists,  given  Q  and  /,  an  m-median  of  Q  that 
induces  a  Voronoi  partition  that  is  equitable  with  respect  to 
/  and  then  the  above  policy  is  optimal  both  in  light 

load  and  arbitrarily  close  to  optimality  in  heavy  load,  and 
stabilizes  the  system  in  every  load  condition.  There  are  two 
main  issues  with  the  above  policy,  namely  (i)  existence  of 
an  m-median  of  Q  that  induces  a  Voronoi  partition  that  is 
equitable  with  respect  to  /  and  f^^^,  and  (ii)  how  to  compute  it 
in  a  distributed  way.  In  [27],  we  showed  that  for  some  choices 
of  Q  and  /  a  median  Voronoi  diagram  that  is  equitable  with 
respect  to  /  and  fails  to  exist.  On  the  other  hand,  in  [28], 
we  presented  a  synchronous,  distributed  partitioning  algorithm 
that,  for  any  possible  choice  of  Q  and  /,  provides  a  partition 
of  Q  that  is  equitable  with  respect  to  /  and  represents  a 
“good”  approximation  of  a  median  Voronoi  diagram  (see  [28] 
for  details  on  the  metrics  that  we  use  to  judge  “closeness”  to 
median  Voronoi  diagrams).  If  an  m-median  of  Q  that  induces  a 
Voronoi  partition  that  is  equitable  with  respect  to  /  exists,  the 
algorithm  will  locally  converge  to  it.  Accordingly,  we  define 
the  multi- vehicle  Divide  &  Conquer  policy  as  follows: 


where  is  the  optimal  unbiased  performance.  In  general,  an 
m-partition  that  is  equitable  with  respect  to  /  is  not  equitable 
with  respect  to  therefore,  in  general,  the  inequality 

||z||i  <  YTri||z||2  is  strict  and  T  is  larger  than  the  optimal 
unbiased  performance. 

On  the  other  hand,  lkl|2  <  ll^lll  for  any  vector  z  €  M"*, 
therefore  by  using  the  same  reasoning  as  in  equation  (19)  we 
obtain 


Algorithm  3:  Multi- Vehicle  DC  (m-DC)  Policy 

1  The  vehicles  run  the  distributed  partitioning  algorithm  defined 
as  a  set  of  differential  equations  in  [28],  by  using  as  input 
measure  /. 

2  Simultaneously,  each  vehicle  executes  the  single-vehicle  DC 
policy  inside  its  own  subregion  (the  assignment  of  vehicles  to 
subregions  is  a  by-product  of  the  algorithm  in  [28]). 


_  1  /^TSP,2 


A 

EtSP,2 


Allzll 


2  < 


1  /3tSP,2 


(1  —  g) 


m 

n2 


m  (1  — 


=  m  Ty ,  as  p  — >  1 


where,  again,  is  the  optimal  unbiased  performance.  Note 
that  in  this  case  the  resulting  multi-vehicle  policy  might  be 
spatially-biased.  ■ 

Remark  6.6:  It  is  immediate  to  generalize  Theorem  6.5  as 
follows.  Let  TT  be  a  single-vehicle  unbiased  policy  such  that 
in  heavy  load  <  7.  As  p  — 1~,  given  such  policy  tt 

and  m  vehicles,  a  7r-partitioning  policy  using  an  m-partition 
which  is  equitable  with  respect  to  /  is  always  within  a  factor 
7  m  of  the  optimal  unbiased  performance. 


According  to  Remark  6.6  the  m-DC  policy  is  within  a  factor 
{1+1 /r)  m  of  the  optimal  unbiased  performance  in  heavy  load 
(since  the  algorithm  in  [28]  always  provides  a  partition  of  Q 
that  is  equitable  with  respect  to  /),  and  stabilizes  the  system 
in  every  load  condition.  In  general,  the  m-DC  policy  is  only 
suboptimal  in  light  load;  note,  however,  that  the  computation 
of  the  global  minimum  of  the  Weber  function  Hm  (which  is 
non-convex  for  m  >  1)  is  difficult  for  m  >  1  (it  is  NP-hard 
for  the  discrete  version  of  the  problem,  see  [18]);  therefore, 
for  m  >  1,  suboptimality  has  also  to  be  expected  from  any 
practical  implementation  of  the  SQM  policy.  If  an  m-median 
of  Q  that  induces  a  Voronoi  partition  that  is  equitable  with 
respect  to  /  exists,  the  partitioning  algorithm  used  by  the  m- 
DC  will  locally  converge  to  it,  thus  we  say  that  the  m-DC 
policy  is  “locally”  optimal  in  light  load. 
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Note  that,  when  the  density  is  uniform,  a  partition  that  is 
equitable  with  respect  to  /  is  also  equitable  with  respect  to 
therefore,  when  the  density  is  uniform  the  m-DC  policy 
(with  r  — >  +oo)  is  an  optimal  unbiased  policy  in  heavy  load 
(see  Theorem  6.1).  The  m-DC  policy  adapts  to  arrival  rate  A, 
expected  on-site  service  s,  and  vehicle’s  velocity  v;  however, 
it  requires  the  knowledge  of  /.  If  this  is  not  available,  the 
vehicles  might  use  an  empirical  distribution  function  for  the 
demand  locations  (which  might  be  computed  either  through 
a  centralized  algorithm  or  through  a  distributed  algorithm). 
Since  the  demand  locations  are  i.i.d.,  the  empirical  distribution 
will  converge  to  the  true  distribution,  and  therefore  after  a 
transient  (whose  length  depends  on  the  arrival  rate  A)  the 
vehicles  will  provide  a  system  time  that  agrees  with  the  bounds 
derived  under  the  assumption  that  /  is  known  by  the  vehicles. 
The  key  feature  of  the  partitioning  algorithm  in  [28]  is  that  it  is 
distributed',  therefore,  the  m-DC  policy  is  indeed  a  distributed 
control  policy.  The  multi-vehicle  version  of  the  RH  policy 
(which  we  call  m-RH  policy)  can  be  defined  in  an  identical 
way.  Its  properties  are  summarized  in  Table  II  in  Section  I. 


VII.  On  the  Case  with  Zero  On-Site  Service  Time 


In  the  formulation  of  the  m-DTRP  we  assumed  that  s  >  0 
(see  Section  III).  However,  it  is  of  interest  to  also  study  the 
case  s  =  0,  since  in  some  applications  the  on-site  service 
time  might  actually  be  zero  (or  negligible).  We  show  that, 
perhaps  surprisingly,  when  s  =  0  a  tt* -partitioning  policy 
using  an  m-partition  which  is  equitable  with  respect  to 
is  an  optimal  unbiased  policy  in  heavy  load.  Hence,  when 
s  =  0,  optimality  does  not  require  simultaneous  equitability, 
a  considerable  simplification. 

Specifically,  when  s  =  0,  the  light-load  regime  is  defined  as 
A  — >  0+,  while  the  heavy-load  regime  is  defined  as  A  — >  -foo. 
Then,  it  is  possible  to  verify  that  all  the  performance  bounds 
in  Sections  III,  IV,  and  V  hold  by  simply  substituting  p  =  0, 
and  by  formally  changing  the  limits  p  — >  O’*"  and  p  — >  1“  with 
A  — >  0+  and  A  — >  -foo.  For  example,  equation  (4)  reads  Ty  > 

/StSP  2  ^  f/o  ^  .  rj,,  ■  .... 

I  '  ^  ^2^2 - as  A  — )■  -foo.  The  major  difference  is 

about  which  type  of  partition  one  should  use  in  heavy  load 
(i.e.,  when  A  is  large).  We  have  the  following  result. 

Theorem  7.1  (Optimal  partitioning  when  s  =  0).'  Assume 
s  =  0.  As  A  — >  -foo,  given  a  single-vehicle,  optimal  unbiased 
policy  TT*  and  m  vehicles,  a  tt* -partitioning  policy  using  an 
m-partition  which  is  equitable  with  respect  to  is  an 

optimal  unbiased  policy  for  the  m-DTRR 

Proof:  We  only  sketch  the  proof,  since  it  is  very  similar 
to  the  proof  of  Theorem  6.1.  Let  tt*  be  an  optimal  unbiased 
policy;  by  considering  an  m-partition  {Qk}'k=i  2 
equitable  with  respect  to  and  by  using  arguments  similar 
to  the  ones  in  the  proof  of  Theorem  6.1,  one  can  show  that, 
as  A  — y  -foo. 


^  =  E 


f{x)  dx 


/^TSP,2 


Jo  d.^ 


k=l 

?2 


(20) 


/5tSP,2 


dx 


where  in  the  last  equality  we  have  used  the  fact  that 
Iq  fix)  dx  =  1.  Thus,  the  bound  in  equation  (4)  is 
achieved.  The  proof  that  this  multi-vehicle  policy  is  unbiased 
requires  arguments  identical  to  those  presented  in  the  proof  of 
Theorem  6.1,  and  therefore  it  is  omitted. 

Remark  7.2:  It  is  immediate  to  generalize  Theorem  7.1  as 
follows.  Let  TT  be  a  single-vehicle  unbiased  policy  such  that  in 
heavy  load  T^^/T-^  <  7.  Assume  s  =  0.  As  A  — )■  -foo,  given 
such  policy  tt  and  m  vehicles,  a  7r-partitioning  policy  using  an 
m-partition  which  is  equitable  with  respect  to  is  always 
within  a  factor  7  of  the  optimal  unbiased  performance. 

■ 

Theorem  7.1  is  very  important  from  a  practical  viewpoint, 
since  it  means  that  when  s  =  0  optimality  does  not  require 
simultaneous  equitability.  In  light  of  Theorem  7.1,  when  s  = 
0  the  m-DC  policy  should  be  defined  in  the  same  way  as 
Algorithm  3,  with  the  exception  that  the  partitioning  algorithm 
in  [28]  should  partition  Q  according  to  (instead  of  /). 
Since  the  partitioning  algorithm  in  [28]  is  then  guaranteed  to 
converge  to  a  partition  that  is  equitable  with  respect  to 
we  conclude  that  when  s  =  0  the  distributed  m-DC  policy 
(with  the  modification  that  equitability  should  be  with  respect 
to  and  with  r  — >  -foo)  is  an  optimal  unbiased  policy  in 
heavy  load  for  any  density  /. 

VIII.  Simulation  Experiments 

In  this  section  we  discuss,  through  the  use  of  simulations, 
the  heavy-load  behavior  of  both  the  DC  and  the  RH  policy,  and 
we  comment  on  their  relative  performance.  All  simulations  are 
performed  on  a  machine  with  a  2.4GHz  Intel  Core  Duo  pro¬ 
cessor  and  4GB  of  RAM.  The  code  is  written  in  Matlab®7.4 
with  external  calls  to  the  program  linkern,  for  which  we 
set  a  running  time  bound  of  one  second.  In  all  simulations  we 
consider  a  circular  environment  Q  =  {{x,y)  G  |  a;^  -f  < 
I/tt}  (hence,  the  area  of  Q  is  1),  a  vehicle’s  velocity  v  =  1, 
and  an  on-site  service  time  uniformly  distributed  in  the  interval 
[0,  1]  (thus  s  =  0.5).  The  spatial  density  f  :  Q  ^  K>o 
used  in  the  simulation  experiments  is  the  piecewise  constant 
function  f{x)  =  ^XQiix)  +  Y^Xa2ix),  where  xsix)  is  the 
indicator  function  for  the  set  5  C  Q  (having  the  value  1  when 
X  G  S  and  0  otherwise),  Qi  =  {{x,y)  €  |  <  e/ir}, 

Qi  =  Q\  Qi,  and  e,  6  G  (0,  1).  If  1  —  5  =  e,  then  the  density 
/  is  uniform;  if,  instead,  1  —  5  >  e,  then  the  density  /  has 
a  peak  in  subregion  Qi.  In  all  simulation  experiments  we  set 
£  =  0.1. 

A.  Heavy-Load  Performance  of  the  DC  Policy 

We  first  consider  a  uniform  spatial  density,  i.e.,  5=1  — 
e  =  0.9.  We  consider  5  values  of  the  load  factor,  namely 
Q  =  0.9,0.93,0.95,0.97,0.99.  Circles  in  the  top  figure  of 
Figure  1  represent  the  ratios  between  the  experimental  system 
time  Tdc(1)  (i-C-,  r  =  1  in  the  DC  policy)  and  (whose 
expression  is  given  in  equation  (4)),  for  the  different  values 
of  Q.  Squares  in  the  top  figure  of  Figure  1  represent  the  ratios 
between  the  experimental  system  time  Tdc(16)  (i.e.,  r  =  16 
in  the  DC  policy)  and  T^,  for  the  different  values  of  g.  It  can 
be  observed  that  for  r  =  1  the  ratios  Tdc(1)/F'u  decrease  as 
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5  =  0.9 


o 

Q 

Ih 


o 

Q 

Ih 


Q 

5  =  0.6 


spatial  density;  in  particular  we  consider  5  —  0.6,  i.e.,  a  peak  in 

the  small  subregion  Qi .  Note  that  in  this  case  a  simultaneously 

equitable  partition  can  be  trivially  obtained  by  using  radial 

cuts.  Results  are  shown  in  the  bottom  bgure  of  Figure  1. 

Circles  represent  the  ratios  between  the  experimental  system 
- * 

time  and  for  r  =  1,  while  squares  represent  the  ratios 
between  the  experimental  system  time  and  for  r  =  16.  As 
before,  the  results  are  in  accordance  with  the  upper  bound  in 
Theorem  4.2.  The  computation  times  are  very  similar  to  those 
shown  in  Table  III  and  therefore  they  are  omitted. 

Finally,  from  Figure  1  it  can  be  observed  that  as  p  — >^  the 
DC  policy  with  r  =  16  performs  better  than  the  DC  policy 
with  r  =  1,  precisely  by  a  factor  2  in  accordance  with  the 
upper  bound  in  Theorem  4.2.  On  the  other  hand,  for  moderate 
values  of  g,  say  g  «  0.9,  the  DC  policy  with  r  =  1  appears  to 
perform  slightly  better  than  the  DC  policy  with  r  =  16  (recall 
that  the  performance  bounds  hold  in  the  limit  g  —>■  1“). 


Fig.  1.  Top  Figure:  Ratio  between  experimental  system  time  under  the  DC 
policy  and  (whose  expression  is  given  in  equation  (4))  in  the  case  of 
uniform  density  (i.e.,  5  =  0.9).  Bottom  Figure:  Ratio  between  experimental 
system  time  under  the  DC  policy  and  Tjj  in  the  case  of  non-uniform  density 
(<5  =  0.6).  Circles  correspond  to  the  DC  policy  with  r  =  1,  while  squares 
correspond  to  the  DC  policy  with  r  =  16. 


TABLE  III 

Computation  times  for  DC  policy 


Q  =0.9 

e  =0.93 

Q  =0.95 

Q  =0.97 

Q  =0.99 

r  =  1 

~  0.3s 

~  0.8s 

~  1.1s 

~  1.7s 

~  5s 

r  =  16 

~  0.03s 

~  0.1s 

~  0.15s 

~  0.3s 

~  1.3s 

g  approaches  one  and  tend  to  2,  in  accordance  with  the  upper 
bound  in  Theorem  4.2  (in  fact,  1  +  1/r  =  2  in  this  case). 
Similarly,  for  r  =  16  the  ratios  Tdc{16)/T^j  decrease  as  g 
approaches  one  and  tend  to  (1  +  1/16)  «  1.06,  in  accordance 
again  with  the  upper  bound  in  Theorem  4.2.  These  results 
conhrm  the  theoretical  analysis  in  Section  IV  and  suggest  that 
the  upper  bound  (6)  is  indeed  tight  for  the  various  values 
of  r.  Note  that  for  all  values  of  g  the  experimental  system 
time  is  larger  than  the  upper  bound;  this  is  due  to  one  or  a 
combination  of  the  following  reasons:  First,  the  upper  bound 
formally  holds  only  in  the  limit  p  — >  1“.  Second,  we  are  using 
an  approximate  solution  for  the  optimal  TSR 

In  Table  III  we  show  for  both  r  =  1  and  r  =  16  the  average 
time  required  for  the  computation  of  a  TSP  tour.  Specihcally, 
the  computation  time  includes  (i)  the  time  to  read  the  points 
from  an  input  hie,  (ii)  the  TSP  computation  time,  and  (iii)  the 
time  to  save  the  tour  on  a  hie.  Recall  that  we  implemented 
the  DC  policy  by  using  linkern  as  a  solver  to  generate 
approximations  to  the  optimal  TSP  tour,  and  we  set  a  running 
time  bound  of  one  second  for  this  solver.  One  can  observe  that 
the  computation  times  are  rather  small  even  for  values  of  g 
very  close  to  one;  moreover,  since  the  experimental  results  are 
very  close  to  the  theoretical  bounds,  we  argue  that  linkern 
(with  a  running  time  bound  of  one  second)  computes  TSP 
tours  very  close  to  the  optimal  ones.  Hence,  we  conclude  that 
the  DC  policy  can  be  effectively  implemented  in  real-word 
applications  by  using  linkern  as  a  subroutine. 

The  same  set  of  simulations  is  performed  for  a  non-uniform 


B.  Heavy-Load  Performance  of  the  RH  Policy 

We  hrst  consider  a  uniform  spatial  density.  We  consider  3 
values  of  the  load  factor,  namely  g  =  0.95,0.97,0.99,  and  7 
values  of  rj,  namely  77  =  0.1,  0.2, 0.3,  0.7, 0.8,  0.9, 1.  Formally, 
the  RH  policy  is  not  dehned  for  p  =  1,  however,  by  setting 
77  =  1  in  the  dehnition  of  the  RH  policy,  we  simply  obtain 
the  DC  policy  with  r  =  1.  The  top  hgure  of  Figure  2  shows 
the  ratios  between  the  experimental  system  time  Trh(77)  and 
Ty  (whose  expression  is  given  in  equation  (4))  for  each  pair 
{g,  rf).  The  same  set  of  simulations  is  performed  for  a  non- 
uniform  spatial  density;  in  particular  we  consider  5  —  0.6. 
Results  are  shown  in  the  bottom  hgure  of  Figure  2.  The  results 
conhrm  that  the  system  time  Trh  follows  the  1/(1  — 
growth  predicted  by  Theorem  5.1.  In  Section  V  we  argued 
that  in  heavy  load  Trh (77)  <  (see  equation  (17))  for 

all  77  G  (0,  1);  from  Figure  2,  it  can  be  observed  that  as  g 
approaches  one  this  asymptotic  bound  seems  to  be  conhrmed, 
both  for  a  uniform  /  and  for  a  non-uniform  /.  We  also 
argued  in  Section  V  that  the  performance  of  the  RH  policy 
should  improve  as  the  horizon  becomes  shorter:  this  is  indeed 
the  case,  in  particular  simulation  results  show  that  optimal 
performance  is  achieved  for  77  «  0.2.  Finally,  the  computation 
times  for  the  RH  policy  with  any  77  are  very  similar  to  those 
of  the  DC  policy  with  r  =  1  and  therefore  they  are  omitted. 

The  RH  policy  should  be  compared  with  the  version  of  the 
DC  policy  that  is  also  adaptive  with  respect  to  /,  i.e.,  with 
the  DC  policy  with  r  =  1.  Indeed,  as  observed  before,  the  DC 
policy  with  r  =  1  is  the  same  as  the  RH  policy  when  we  set 
77  =  1.  Therefore,  we  can  compare  the  two  policies  by  looking 
at  Figure  2.  One  should  note  that  the  RH  policy  with  77  «  0.2 
performs  consistently  better  than  the  DC  policy  with  r  =  1 
(which,  again,  is  the  same  as  the  RH  policy  with  77  =  1),  in 
particular  it  decreases  the  system  time  by  «  20%. 

C.  Performance  of  the  Multi-Vehicle  DC  Policy 

In  this  section  we  study,  through  the  use  of  simulations,  the 
behavior  of  the  ttt-DC  policy;  we  focus,  in  particular,  on  the 
heavy-load  scenario  (in  light  load,  the  system  time  is  directly 
related  to  the  convergence  of  the  partitioning  algorithm  to 
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Fig.  2.  Top  Figure:  Ratio  between  experimental  system  time  under  the  RH 
policy  and  Tjj  (whose  expression  is  given  in  equation  (4))  in  the  case  of 
uniform  density  (i.e.,  5  =  0.9).  Bottom  Figure:  Ratio  between  experimental 
system  time  under  the  RH  policy  and  Tjj  in  the  case  of  non-uniform  density 
(<5  =  0.6). 


a  median  Voronoi  diagram,  for  which  we  refer  to  [28]). 
When  the  density  /  is  uniform,  a  partition  that  is  equitable 
with  respect  to  /  is  also  equitable  with  respect  to  By 

following  the  same  arguments  as  in  equation  (18)  one  can 
show  that  in  heavy  load  the  steady-state  system  time  within  the 
entire  environment  follows  the  same  behavior  of  the  steady- 
state  system  time  within  each  subregion.  Hence,  when  /  is 
uniform  and  the  system  is  in  heavy  load  the  simulation  results 
for  the  m-DC  policy  are  virtually  identical  to  the  simulation 
results  for  the  1-DC  policy  (presented  above),  and  therefore 
they  are  omitted. 

When  the  density  /  is  not  uniform,  a  partition  that  is 
equitable  with  respect  to  /  is  not  necessarily  equitable  with 
respect  to  Remark  6.6,  however,  guarantees  that  the  m- 
DC  policy  in  heavy  load  is  within  a  factor  (1  -f  1/r)  m  of  the 
optimal  unbiased  performance.  We  verify  this  result  through 
simulation  experiments.  We  consider  a  square  environment 
Q  =  {(x,  j/)  G  I  0  <  X  <  1,  0  <  2/  <  1}  (hence,  the  area 
of  Q  is  1),  vehicles’  velocity  v  =  1,  and  an  on-site  service 
time  uniformly  distributed  in  the  interval  [0,  1]  (thus  s  =  0.5). 
The  spatial  density  f  :  Q  ^  M>o  used  in  these  simulations  is 
the  piecewise  constant  function  /(x)  =  3  xQi  {x)  +  |  XQ2 
where  xs{x)  is  the  indicator  function  for  the  set  S  C  Q, 
Qi  =  {{x,y)  G  1 0.5  <  X  <  1,  0.5  <  y  <  1},  and 
Q2  =  Q  \  Qi-  We  consider  a  load  factor  g  =  0.9  and  4 
values  for  the  number  m  of  vehicles,  namely  m  =  2, 4, 6, 8. 
Within  each  subregion,  we  simulate  the  DC  policy  with  r  =  1, 
which  has  a  constant  factor  guarantee  equal  to  2  (see  Theorem 
4.2).  The  circles  in  Figure  3  represent  the  ratios  between  the 
experimental  system  time  T^-dc  and  T^,  for  the  different 
values  of  m.  The  squares  in  Figure  3  represent  the  upper 
bounds  for  such  ratios,  which,  according  to  Remark  6.6,  are 
equal  to  2  m.  The  simulation  results  show  that  the  ratios 
Tm-Dc/Tu  seem  not  to  increase  linearly  with  m,  or,  in  other 


words,  that  the  upper  bound  in  Remark  6.6  is  conservative. 
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Fig.  3.  The  circles  represent  the  ratios  between  the  experimental  system 
times  under  the  m-DC  policy  (with  r  =  1)  and  Tjj;  the  squares  represent  the 
theoretical  upper  bounds  on  these  ratios.  The  load  factor  is  ^  =  0.9. 


IX.  Conclusion 

The  m-DTRP  is  a  widely  applicable  model  for  dynamic 
task-assignment  and  scheduling  problems  in  robotic  networks. 
In  this  paper,  our  objective  was  to  design  unbiased  policies  for 
the  m-DTRP  that  (i)  are  adaptive  (in  particular  do  not  require 
the  knowledge  of  the  arrival  rate  A  and  the  statistics  of  the  on¬ 
site  service  time),  (ii)  enjoy  provable  performance  guarantees 
(in  particular,  are  provably  stable  under  any  load  condition), 
and  (iii)  are  distributed.  To  this  end,  we  introduced  two  new 
policies  for  the  1-DTRP,  namely  the  DC  policy  and  the  RH 
policy,  that  were  subsequently  extended  to  the  multi-vehicle 
case  through  the  idea  of  partitioning  policies.  Our  policies 
require  on-line  solutions  of  possibly  large  TSPs;  however,  we 
have  argued  in  Section  VIII  that  our  policies  can  be  effectively 
implemented  in  real-word  applications  by  using  1  inkern  as 
a  solver  to  generate  approximations  to  the  optimal  TSP  tour. 

The  focus  of  this  paper  was  on  the  m-DTRP.  Note,  however, 
that  in  many  dynamic  vehicle  routing  problems  (e.g.,  problems 
with  stochastic  time  constraints  [23]  and  with  priorities  [24]) 
partitioning  policies  provide  “good”  performance.  Hence,  fol¬ 
lowing  the  discussion  in  Section  VI,  we  argue  that  combining 
the  distributed  partitioning  algorithms  presented  in  [28]  with 
appropriate  partitioning  and  single-vehicle  routing  policies 
yields  adaptive  and  distributed  algorithms  for  a  wide  variety 
of  dynamic  vehicle  routing  problems.  The  approach  presented 
in  this  paper  is  therefore  rather  general  and  not  specifically 
tailored  to  the  m-DTRP. 

This  paper  leaves  numerous  important  extensions  open  for 
further  research.  First,  we  plan  to  devise  distributed  algorithms 
that  provide  simultaneously  equitable  partitions;  such  algo¬ 
rithms  would  make  the  m-DC  policy  (with  r  — -foo)  optimal 
for  any  density  /.  Second,  it  is  of  interest  to  design  optimal 
or  near-optimal  policies  that  are  biased  in  heavy  load  and 
that  are  adaptive  and  distributed  (recall  that  allowing  biased 
service  results  in  a  strict  reduction  of  the  optimal  system  time 
for  any  non-uniform  density  /).  Third,  most  of  the  results 
presented  in  this  paper  hold  in  asymptotic  regimes  (either 
p  ^  0+  or  p  — 1  );  addressing  optimality  of  performance 
in  the  intermediate  regime  would  be  very  important  both  on  a 
theoretical  and  on  a  practical  level.  Finally,  in  our  analysis,  we 
considered  omni-directional  vehicles  with  first  order  dynamics: 
more  realistic  dynamical  models  will  have  to  be  taken  into 
account  for  practical  application  to  UAVs  or  other  systems. 
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